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NEISSERIAL POLYNUCLEOTIDES 

This application is a continuation-in-part of international patent application PCT/IB98/01665, filed 
October 9, 1998, from which priority is claimed under 35 U.S.C § 1 19. 

This invention relates to antigens from Neisseria bacteria. 

5 BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); TV. gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
10 different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
present in all pathogenic meningococci. 

N gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 

United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
15 New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 

New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 

Vaccination against N. gonorrhoeae would be highly desirable, but repeated attempts have failed. 

The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 

opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
20 protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 

as vaccine (Meitzner & Cohen, supra). 

N. meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman et 
al (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
25 Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(1 9): 1499-1 503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. N Engl J Med 
337(1 4):970-976). In developing countries, endemic disease rates are much higher and during 
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epidemics incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely 
high, at 10-20% in the United States, and much higher in developing countries. Following the 
introduction of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major 
cause of bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 

Based on the organism's capsular polysaccharide, 12 serogroups of N. meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 
the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 
immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H.influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 
vaccine against meningococcus A and C. Vaccine 10:691-698). 

20 Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer of 
a(2-8)-linked N-acetyl neuraminic acid that is also present in mammalian tissue. This results in 
tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 

25 therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the Af-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 
candidates: capsular or non-capsular. Clin Microbiol Rev 7(4):559-575). 
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Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
5 strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine 
different porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal 
vaccine. Infect Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines 
have been the opa and opc proteins, but none of these approaches have been able to overcome the 
10 antigenic variability (eg. Ala'Aldeen & Bordello (1996) The meningococcal transferrin-binding 
proteins 1 and 2 are both surface exposed and generate bactericidal antibodies capable of killing 
homologous and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
15 further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
are presumed targets for the immune system and which are not antigenically variable. For instance, 
some of the identified proteins could be components of efficacious vaccines against 
meningococcus B, some could be components of vaccines against all meningococcal serotypes, and 
others could be components of vaccines against all pathogenic Neisseriae. 

20 THE INVENTION 

The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
examples. These sequences relate to TV. meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
25 the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
homologous proteins include mutants and allelic variants of the sequences disclosed in the 
examples. Typically, 50% identity or more between two proteins is considered to be an indication 
of functional equivalence. Identity between the proteins is preferably determined by the 
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Smith-Waterman homology search algorithm as implemented in the MPSRCH program (Oxford 
Molecular), using an affine gap search with parameters gap open penalty- 12 and gap extension 
penalty- 1. 

The invention further provides proteins comprising fragments of the Neisserial amino acid 
5 sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more (eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 

The proteins of the invention can, of course, be prepared by various means (eg, recombinant 
expression, purification from cell culture, chemical synthesis etc.) and in various forms (eg. native, 
10 fusions etc.). They are preferably prepared in substantially pure or isolated form (ie. substantially 
free from other Neisserial or host cell proteins) 

According to a further aspect, the invention provides antibodies which bind to these proteins. These 
may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
15 nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous (ie. having sequence identity) to the Neisserial nucleotide 
sequences disclosed in the examples. 

Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions (eg. 65°C in a O.lxSSC, 
20 0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise at 
least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
sequence, n is 10 or more (eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
25 fragments of the invention. 
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It should also be appreciated that the invention provides nucleic acid comprising sequences 
complementary to those described above (eg. for antisense or probing purposes). 

Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
5 forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
those containing modified backbones, and also peptide nucleic acids (PNA) etc. 

According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

10 According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, for 
instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use as 
medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 

15 protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisseria! bacteria; (ii) a diagnostic reagent for detecting the 
presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain of N. meningitidis, such as strain A, strain B 

20 or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

25 A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 
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A process for producing protein or nucleic acid of the invention is provided, wherein the the 
protein or nucleic acid is synthesised in part or in whole using chemical means. 

A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
5 conditions to form duplexes; and (b) detecting said duplexes. 

A process for detecting proteins of the invention is provided, comprising the steps of: (a) 
contacting an antibody according to the invention with a biological sample under conditions 
suitable for the formation of an antibody-antigen complexes; and (b) detecting said complexes. 

A summary of standard techniques and procedures which may be employed in order to perform the 
10 invention (eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
are not required. 

General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
15 techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 
Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and ii 
(D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid Hybridization 
(B.D. Hames & S.J. Higgins eds. 1984); Transcription and Translation (B.D. Hames & SJ. 
20 Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and Enzymes 
(IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the Methods in 
Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene Transfer Vectors 
for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor Laboratory); 
Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular Biology 
25 (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, Second 
Edition (Springer- Verlag, N. Y.), and Handbook of Experimental Immunology, Volumes I-IV (D.M. 
Weir and C. C. Blackwell eds 1986). 
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Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 

5 Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 

The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
1 0 X may consist exclusively of X or may include something additional to X, such as X+Y. 

A "conserved" Neisseria amino acid fragment or protein is one that is present in a particular 
Neisserial protein in at least x% of Neisseria. The value of x may be 50% or more, e.g., 66%, 
75%, 80%, 90%, 95% or even 100% (i.e. the amino acid is found in the protein in question in all 
Neisseria). In order to determine whether an animo acid is "conserved" in a particular Neisserial 

15 protein, it is necessary to compare that amino acid residue in the sequences of the protein in 
question from a plurality of different Neisseria (a reference population). The reference population 
may include a number of different Neisseria species or may include a single species. The reference 
population may include a number of different serogroups of a particular species or a single 
serogroup. A preferred reference population consists of the 5 most common NeisseriaTht term 

20 "heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 
Neisserial sequence is heterologous to a mouse host cell. A further examples would be two 

25 epitopes from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 



CHIR-0160 (356.001) PATENT 

-8- 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 
unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
5 origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 
cells. 

A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 
Smith- Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has a 
similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes a 
protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 

i. Mammalian Systems 

25 Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 
transcription initiating region, which is usually placed proximal to the 5' end of the coding 
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sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 
5 determines the rate at which transcription is initiated and can act in either orientation [Sambrook et 
al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2ndedJ. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore 
sequences encoding mammalian viral genes provide particularly useful promoter sequences. 
10 Examples include the SV40 early promoter, mouse mammary tumor virus LTR promoter, 
adenovirus major late promoter (Ad MLP), and herpes simplex virus promoter. In addition, 
sequences derived from non-viral genes, such as the murine metallotheionein gene, also provide 
useful promoter sequences. Expression may be either constitutive or regulated (inducible), 
depending on the promoter can be induced with glucocorticoid in hormone-responsive cells. 

15 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, 
with synthesis beginning at the normal RNA start site. Enhancers are also active when they are 
placed upstream or downstream from the transcription initiation site, in either normal or flipped 

20 orientation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. 
(1987) Science 236:1231; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer 
elements derived from viruses may be particularly useful, because they usually have a broader host 
range. Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO 7. 4:761] and 
the enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

25 [Gorman et al. (1982b) Proc. Natl Acad. Sci 79:6777] and from human cytomegalovirus [Boshart 
et al: (1985) Cell 47:521]. Additionally, some enhancers are regulatable and become active only in 
the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet 2:215; Maniatis et al. (1987) Science 236:1237]. 
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A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may 
be directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of 
the recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen 
5 bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment 
that provides for secretion of the foreign protein in mammalian cells. Preferably, there are 
processing sites encoded between the leader fragment and the foreign gene that can be cleaved 
10 either in vivo or in vitro. The leader sequence fragment usually encodes a signal peptide comprised 
of hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein in 
mammalian cells. * 

Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
15 are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature mRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 47:349; 
Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 
Transcription and splicing (ed. B.D. Hames and D.M. Glover); Proudfoot (1989) Trends Biochem. 
20 ScL 74:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
mammalian cells." In Molecular Cloning: A Laboratory Manual], 

Usually, the above described components, comprising a promoter, polyadenylation signal, and 
25 transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 
an extrachromosomal element {eg. plasmids) capable of stable maintenance in a host, such as 
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mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 23:175] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 
5 antigen. Additional examples of mammalian replicons include those derived from bovine 
papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufman et al. (1989) Mol Cell Biol 9:946] and pHEBO [Shimizu et al. 
10 (1986) Mol Cell Biol 6: 1 074] . 

The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and 
include dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated 
transfection, protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in 
15 liposomes, and direct microinjection of the DNA into nuclei. 

Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. Hep G2), and a 
20 number of other cell lines. 

ii. Baculovirus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression 
vector, and is operably linked to the control elements within that vector. Vector construction 
employs techniques which are known in the art. Generally, the components of the expression 
25 system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of the 
baculovirus genome, and a convenient restriction site for insertion of the heterologous gene or 
genes to be expressed; a wild type baculovirus with a sequence homologous to the baculovirus- 
specific fragment in the transfer vector (this allows for the homologous recombination of the 
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heterologous gene in to the baculovirus genome); and appropriate insect host cells and growth 
media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
wild type viral genome are transfected into an insect host cell where the vector and viral genome 
5 are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
10 and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
15 elements; multiple genes, each with its owned set of operably linked regulatory elements; or 
multiple genes, regulated by the same set of regulatory elements. Intermediate transplacement 
constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) 
capable of stable maintenance in a host, such as a bacterium. The replicon will have a replication 
system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

20 Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 77:31. 

25 The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol, 42:171) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
replication for selection and propagation in E. coli. 
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Baculovirus transfer vectors usually contain a baculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 
(5' to 3') transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
5 sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 
regulated or constitutive. 

Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
10 useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology of Baculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
476; and the gene encoding the pi 0 protein, Vlak et al., (1988), J. Gen. Virol 69:765. 

DNA encoding suitable signal sequences can be derived from, genes for secreted insect or 
15 baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 
insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
20 those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 375:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell Biol 8:3129; 
human IL-2, Smith et al., (1985) Proa Natl Acad ScL USA, 82:8404; mouse IL-3, (Miyajima et 
al., (1987) Gene 58:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
be used to provide for secretion in insects. 

25 A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
containing suitable translation initiation signals preceding an ATG start signal. If desired, 
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methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be 
secreted from the insect cell by creating chimeric DNA molecules that encode a fusion protein 
5 comprised of a leader sequence fragment that provides for secretion of the foreign protein in 
insects. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic 
amino acids which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor of 
the protein, an insect cell host is co- transformed with the heterologous DNA of the transfer vector 

10 and the genomic DNA of wild type baculovirus - usually by co-transfection. The promoter and 
transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al., Mol Cell Biol (1983) 5:2156; and Luckow and Summers (1989)). For example, the insertion 

15 can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 
insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays 4:91. The DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 

20 The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1 % 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 

25 which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 Dm in size, are highly refractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 
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infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 
wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
5 virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 
(Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti , 
Autographa californica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
10 Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) 7. Virol 55:153; Wright (1986) Nature 
327:718; Smith et al., (1983) Mol Cell Biol 3:2156; and see generally, Fraser, et al (1989) In 
Vitro Cell Dev. Biol 25:225). 

Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
15 known to those skilled in the art. See, eg. Summers and Smith supra. 

The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
stable maintenance of the plasmid(s) present in the modified insect host. Where the expression 
product gene is under inducible control, the host may be grown to high density, and expression 
induced. Alternatively, where expression is constitutive, the product will be continuously expressed 

20 into the medium and the nutrient medium must be continuously circulated, while removing the 
product of interest and augmenting depleted nutrients. The product may be purified by such 
techniques as chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, 
etc.; electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, 
the product may be further purified, as required, so as to remove substantially any insect proteins 

25 which are also secreted in the medium or result from lysis of insect cells, so as to provide a product 
which is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
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These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 

iii. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. 
5 Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 
of plant protein signal peptides may be found in addition to the references described above in 
Vaulcombe et al., Mol Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 

10 3:407-418 (1984); Rogers, /. Biol Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:25 15-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 
of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 
gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 

15 Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBO J. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl Acad. ScL 
84:1337-1339 (1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
20 expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 
25 desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 
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general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant MoL Biol Reptr, 1 1 (2): 165-1 85. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
are also recommended. These might include transposon sequences and the like for homologous 
5 recombination as well as Ti sequences which permit random insertion of a heterologous expression 
cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

The nucleic acid molecules of the subject invention may be included into an expression cassette for 
10 expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
although two or more are feasible. The recombinant expression cassette will contain in addition to 
the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 
equipped with one, and a transcription and translation termination sequence. Unique restriction 
15 enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 
in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 

20 transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 
secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 

25 seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
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out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
5 transfer the recombinant DNA. Crossway, Mol Gen. Genet, 202:179-185, 1985. The genetic 
material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 
Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
10 Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 
transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 
entities, either minicells, cells, lysosomes or other fusible lipid-surfaced bodies, Fraley, et al., Proc. 
Natl Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. 
15 Natl Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
presence of plasmids containing the gene construct. Electrical impulses of high field strength 
reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 

All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
20 be transformed by the present invention so that whole plants are recovered which contain the 
transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Ijotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
25 Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 
Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, 
Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
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Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 

5 is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 
formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 

10 simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 
history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 

15 invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 

20 adjusted through routine methods to optimize expression and recovery of heterologous protein. 

iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of a 
coding sequence {eg. structural gene) into mRNA. A promoter will have a transcription initiation 
25 region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation 
site. A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 
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negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 
thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence of 
negative regulatory elements, such as the operator. In addition, positive regulation may be achieved 
by a gene activator protein binding sequence, which, if present is usually proximal (5') to the RNA 
5 polymerase binding sequence. An example of a gene activator protein is the catabolite activator 
protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. coli) 
[Raibaud et al. (1984) Annu. Rev. Genet 18:113]. Regulated expression may therefore be either 
positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 

10 Examples include promoter sequences derived from sugar metabolizing enzymes, such as 
galactose, lactose (lac) [Chang et al. (1977) Nature 795:1056], and maltose. Additional examples 
include promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel 
et al (1980) Nuc. Acids Res. 5:4057; Yelverton et al (1981) Nucl Acids Res. 9:731; US 
patent 4,738,921; EP-A-0036776 and EP-A-01 21775]. The g-laotamase (bid) promoter system 

15 [Weissmann (1981) "The cloning of interferon and other mistakes" In Interferon 3 (ed. I. 
Gresser)], bacteriophage lambda PL [Shimatake et al (1981) Nature 292:128] and T5 [US 
patent 4,689,406] promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may be 

20 joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 
synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al. (1983) Gene 25:167; de Boer et al (1983) Proc. Natl. Acad. Sci. 80:21]. 
Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 

25 that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 
occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. 
Mol. Biol. 759:113; Tabor et al (1985) Proc Natl Acad. Set 52:1 074]. In addition, a hybrid 
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promoter can also be comprised of a bacteriophage promoter and an E. coli operator region (EPO- 
A-0 267 851). 

In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
5 Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al (1975) 
Nature 254:34]. The SD sequence is thought to promote binding of mRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al (1979) 
"Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
10 Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al (1989) "Expression of cloned 
genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual], 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
15 methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 

20 of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 
terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign 
gene [Nagai et al (1984) Nature 309:810]. Fusion proteins can also be made with sequences from 

25 the lacZ [Jia et al (1987) Gene 60:1911 trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al 
(1989) 7. Gen. Microbiol 735:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. Another 
example is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that 
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preferably retains a site for a processing enzyme (eg. ubiquitin specific processing-protease) to 
cleave the ubiquitin from the foreign protein. Through this method, native foreign protein can be 
isolated [Miller et al. (1989) Bio/Technology 7:698]. 

Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA 
5 molecules that encode a fusion protein comprised of a signal peptide sequence fragment that 
provides for secretion of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence 
fragment usually encodes a signal peptide comprised of hydrophobic amino acids which direct the 
secretion of the protein from the cell. The protein is either secreted into the growth media (gram- 
positive bacteria) or into the periplasmic space, located between the inner and outer membrane of 
10 the cell (gram-negative bacteria). Preferably there are processing sites, which can be cleaved either 
in vivo or in vitro encoded between the signal peptide fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression', Ghrayeb et al (1984) EMBO J. 3:2437] and the E. coli alkaline 
15 phosphatase signal sequence (phoA) [Oka et al (1985) Proa Natl Acad. Sci. 52:7212], As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 
can be used to secrete heterologous proteins from B. subtilis [Palva et al (1982) Proc. Natl. Acad. 
Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
20 3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
25 such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
coding sequence of interest, and transcription termination sequence, are put together into 
expression constructs. Expression constructs are often maintained in a replicon, such as an 
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extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. 
The replicon will have a replication system, thus allowing it to be maintained in a prokaryotic host 
either for expression or for cloning and amplification. In addition, a replicon may be either a high 
or low copy number plasmid. A high copy number plasmid will generally have a copy number 
5 ranging from about 5 to about 200, and usually about 10 to about 150. A host containing a high 
copy number plasmid will preferably contain at least about 10, and more preferably at least about 
20 plasmids. Either a high or low copy number vector may be selected, depending upon the effect 
of the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
10 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
1 5 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can be 
expressed in the bacterial host and may include genes which render bacteria resistant to drugs such 
as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline [Davies et 
20 al. (1978) Annu. Rev. Microbiol 32:469]. Selectable markers may also include biosynthetic genes, 
such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

25 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl Acad. Sci. USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541], Escherichia 
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coli [Shimatake et al (1981) Nature 292:128; Amann et al (1985) Gene 40:183; Studier et al 
(1986) 7. Mol. Biol. 789:113; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
Streptococcus cremoris [Powell et al (1988) Appl Environ. Microbiol 54:655]; Streptococcus 
lividans [Powell et al (1988) Appl Environ. Microbiol 54:655], Streptomyces lividans [US patent 
5 4,745,056]. 

Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 

10 [Masson et al (1989) FEMS Microbiol Lett 60:273; Palva et al (1982) Proc. Natl Acad ScL 
USA 79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al (1988) 
Proc. Natl Acad. Sci. 85:856; Wang et al (1990) J. Bacteriol 772:949, Campylobacter], [Cohen et 
al. (1973) Proc. Natl. Acad. Sci. 69:2110; Dower et al (1988) Nucleic Acids Res. 76:6127; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl -derived 

15 plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al (1970) J. Mol Biol. 53:159; Taketo 
(1988) Biochim. Biophys. Acta 949:318; Escherichia], [Chassy et al. (1987) FEMS Microbiol Lett 
44-A73 Lactobacillus]; [Fiedler et al. (1988) Anal Biochem 770:38, Pseudomonas]; [Augustin et al. 
(1990) FEMS Microbiol. Lett 6(5:203, Staphylococcus], [Barany et al. (1980) J. Bacteriol 

20 744:698; Harlander (1987) 'Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al. (1981) Infect. Immun. 
32:1295; Powell et al (1988) Appl Environ. Microbiol. 54:655; Somkuti et al. (1987) Proc. 4th 
Evr. Cong, Biotechnology 7:412, Streptococcus], 

v. Yeast Expression 

25 Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region usually includes an RNA polymerase binding site (the 
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"TATA Box") and a transcription initiation site. A yeast promoter may also have a second domain 
called an upstream activator sequence (UAS), which, if present, is usually distal to the structural 
gene. The UAS permits regulated (inducible) expression. Constitutive expression occurs in the 
absence of a UAS. Regulated expression may be either positive or negative, thereby either 
5 enhancing or reducing transcription. 

Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucoses- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
10 phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
[Myanohara etal (1983) Proc. Natl Acad. ScL USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 

1 5 region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 
(US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 
which consist of the regulatory sequences of either the ADH2, GALA, GAL10, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 

20 PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 
Examples of such promoters include, inter alia, [Cohen et al (1980) Proc. Natl Acad. Sci. USA 
77:1078; Henikoff et al (1981) Nature 283:835; Hollenberg et al (1981) Cwrr. Topics Microbiol 
Immunol 96:119; Hollenberg et al (1979) "The Expression of Bacterial Antibiotic Resistance 

25 Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al (1980) Gene 
77:163; Panthier^a/. (1980) Curr. Genet 2:109;]. 
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A DNA molecule may be expressed intracellular^ in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
5 cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 

10 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
linked at the 5' terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 

15 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment 
that provide for secretion in yeast of the foreign protein. Preferably, there are processing sites 
20 encoded between the leader fragment and the foreign gene that can be cleaved either in vivo or in 
vitro. The leader sequence fragment usually encodes a signal peptide comprised of hydrophobic 
amino acids which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
25 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha-factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha-factor 
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fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing an 
alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
5 with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 

Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
10 encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
termination sequences, such as those coding for glycolytic enzymes. 

Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 

15 element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 
replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al (1979) Gene 5:17-24], pCl/1 [Brake et al 
(1984) Proc. Natl. Acad. Sci USA 87:4642-4646], and YRpl7 [Stinchcomb et al (1982) 7. Mol 

20 Biol 758:157]. In addition, a replicon may be either a high or low copy number plasmid. A high 
copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 10 to about 150. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 

25 eg. Brake et al, supra. 

Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
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flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al (1983) Methods in 
EnzymoL 707:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al, 
5 supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al. (1983) Proc. Natl Acad. Sci. USA 50:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results in 
the integration of the entire vector, or two segments homologous to adjacent segments in the 
chromosome and flanking the expression construct in the vector, which can result in the stable 
1 0 integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markejs may 
include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
15 tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al (1987) 
Microbiol Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
20 vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
have been developed for transformation into many yeasts. For example, expression vectors have 
been developed for, inter alia, the following yeasts:Candida albicans [Kurtz, et al (1986) Mol 
25 Cell Biol (5:142], Candida maltosa [Kunze, et al (1985) J. Basic Microbiol 25:141]. Hansenula 
polymorpha [Gleeson, et al (1986) 7. Gen. Microbiol 752:3459; Roggenkamp et al (1986) Mol 
Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al (1984) J. Bacteriol. 755:1165], 
Kluyveromyces lactis [De Louvencourt et al (1983) J. Bacteriol. 1 54:131 \ Van den Berg et al. 
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(1990) Bio/Technology 8:135], Pichia guillerimondii [Kunze et al (1985) 7. Basic Microbiol 
25:141], Pichia pastoris [Cregg, et al. (1985) Mol Cell Biol 5:3376; US Patent Nos. 4,837,148 
and 4,929,555], Saccharomyces cerevisiae [Hinnen et al (1978) Proc. Natl Acad. Scl USA 
75:1929; Ito et al (1983) J. Bacteriol 753:163], Schizosaccharomyces pombe [Beach and Nurse 
5 (1981) Nature 300:706], and Yarrowia lipolytica [Davidow, et al (1985) Curr. Genet 70:380471 
Gaillardin, etal (1985) Curr. Genet 70:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 

include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 

Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz et 
10 al (1986) Mol Cell Biol 6:142; Kunze et al (1985) J. Basic Microbiol 25:141; Candida]; 

[Gleeson et al (1986) J. Gen. Microbiol 732:3459; Roggenkamp et al (1986) Mol Gen. Genet. 

202:302; Hansenula]; [Das et al (1984) J. Bacteriol. 758:1165; De Louvencourt et al (1983) J. 

Bacteriol 754:1 165; Van den Berg et al (1990) Bio/Technology 8:135; Kluyveromyces]; [Cregg et 

al (1985) Mol Cell Biol 5:3376; Kunze et al. (1985) 7. Basic Microbiol 25:141; US Patent Nos. 
15 4,837,148 and 4,929,555; Pichia]; [Hinnen et al (1978) Proc. Natl Acad. Scl USA 75;1929; Ito et 

al (1983) J. Bacteriol 753:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 

Schizosaccharomyces]; [Davidow et al (1985) Curr. Genet 70:39; Gaillardin et al. (1985) Curr. 

Genet 70:49; Yarrowia]. 

Antibodies 

20 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 

25 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 
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Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, 
preferably a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of 
polyclonal sera due to the volume of serum obtainable, and the availability of labeled anti-rabbit 
5 and anti-goat antibodies. Immunization is generally performed by mixing or emulsifying the 
protein in saline, preferably in an adjuvant such as Freund's complete adjuvant, and injecting the 
mixture or emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 
jig/injection is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or 
more injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 

10 alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 
the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
recovered by centrifugation (eg. l,000g for 10 minutes). About 20-50 ml per bleed may be 

15 obtained from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein [Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 

20 may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B-cells expressing membrane-bound 
immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 
the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 

25 aminopterin, thymidine medium, "HAT"). The resulting hybridomas are plated by limiting 
dilution, and are assayed for the production of antibodies which bind specifically to the 
immunizing antigen (and which do not bind to unrelated antigens). The selected MAb-secreting 
hybridomas are then cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), 
or in vivo (as ascites in mice). 
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If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 
and 125 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 
are typically detected by their activity. For example, horseradish peroxidase is usually detected by 
5 its ability to convert 3,3',5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand 
molecule with high specificity, as for example in the case of an antigen and a monoclonal antibody 
specific therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and 
protein A, and the numerous receptor-ligand couples known in the art. It should be understood that 

10 the above description is not meant to categorize the various labels into distinct classes, as the same 
label may serve in several different modes. For example, 125 I may serve as a radioactive label or as 
an electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may 
combine various labels for desired effect. For example, MAbs and avidin also require labels in the 
practice of this invention: thus, one might label a MAb with biotin, and detect its presence with 

15 avidin labeled with I25 I, or with an anti-biotin MAb labeled with HRP. Other permutations and 
possibilities will be readily apparent to those of ordinary skill in the art, and are considered as 
equivalents within the scope of the instant invention. 

Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
20 invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 
25 antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 
and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
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in advance. However, the effective amount for a given situation can be determined by routine 
experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 
mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
5 administered. 

A pharmaceutical composition can also contain a pharmaceutical ly acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, 
such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 
pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
10 individual receiving the composition, and which may be administered without undue toxicity. 
Suitable carriers may be large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid 
copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in 
the art. 

15 Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like, A thorough discussion of 
pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences 
(Mack Pub. Co., N.J. 1991). 

20 Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as 
water, saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
Typically, the therapeutic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 

25 may also be prepared. Liposomes are included within the definition of a pharmaceutically 
acceptable carrier. 
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Deliverv Methods 

Once formulated, the compositions of the invention can be administered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
5 subcutaneously, intraperitoneal^, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

10 Vaccines 

Vaccines according to the invention may either be prophylactic (ie. to prevent infection) or 
therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or 
nucleic acid, usually in combination with "pharmaceutical^ acceptable carriers," which include 

15 any carrier that does not itself induce the production of antibodies harmful to the individual 
receiving the composition. Suitable carriers are typically large, slowly metabolized 
macromolecules such as proteins, polysaccharides, polylactic acids, polyglycolic acids, polymeric 
amino acids, amino acid copolymers, lipid aggregates (such as oil droplets or liposomes), and 
inactive virus particles. Such carriers are well known to those of ordinary skill in the art. 

20 Additionally, these carriers may function as immunostimulating agents ("adjuvants"). Furthermore, 
the antigen or immunogen may be conjugated to a bacterial toxoid, such as a toxoid from 
diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 

Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
25 (2) oil-in-water emulsion formulations (with or without other specific immunostimulating agents 
such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
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Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 
5 blocked polymer L121, and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), 
(Ribi Immunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more 
bacterial cell wall components from the group consisting of monophosphorylipid A (MPL), 
trehalose dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); 

10 (3) saponin adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used 
or particles generated therefrom such as ISCOMs (immunostimulating complexes); (4) Complete 
Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant (IFA); (5) cytokines, such as 
interleukins (eg. IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (eg. gamma interferon), 
macrophage colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other 

15 substances that act as immunostimulating agents to enhance the effectiveness of the composition. 
Alum and MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(r-2'-dipalmitoyl-5n-glycero-3- 
20 hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutical! y acceptable carrier, and adjuvant) typically will contain diluents, 
such as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 

25 Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
enhanced adjuvant effect, as discussed above under pharmaceutical^ acceptable carriers. 
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Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 
5 prevention. This amount varies depending upon the health and physical condition of the individual 
to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, 
eta), the capacity of the individual's immune system to synthesize antibodies, the degree of 
protection desired, the formulation of the vaccine, the treating doctor's assessment of the medical 
situation, and other relevant factors. It is expected that the amount will fall in a relatively broad 
10 range that can be determined through routine trials. 

The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
15 schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 
immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et al. (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

20 Gene Delivery Vehicles 

Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of the 
invention, to be delivered to the mammal for expression in the mammal, can be administered either 
locally or systemically. These constructs can utilize viral or non-viral vector approaches in in vivo 
or ex vivo modality. Expression of such coding sequence can be induced using endogenous 
25 mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
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adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
picomavirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
5 6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy 
vector is employable in the invention, including B, C and D type retroviruses, xenotropic 
retroviruses (for example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol 53:160) 
polytropic retroviruses eg. MCF and MCF-MLV (see Kelly (1983) J. Virol 45:291), spumaviruses 
10 and lentiviruses. See RNA Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

15 These recombinant retroviral vectors may be used to generate transduction competent retroviral 
vector particles by introducing them into appropriate packaging cell lines (see US patent 
5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It is 
preferable that the recombinant viral vector is a replication defective recombinant virus. 

20 Packaging cell lines suitable for use with the above-described retrovirus vectors are well known in 
the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells (eg. 
HT1080 cells) or mink parent cell lines, which eliminates inactivation in human serum. 

25 Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
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preferred Murine Leukemia Viruses include 4070A and 1504A (Hartley and Rowe (1976) J Virol 
19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
5 collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
WO89/05349, WO89/09271 , WO90/02806, WO90/07936, WO94/03622, W093/25698, 

10 W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also 
Vile (1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer 
Res 53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller 

1 5 (1990) Human Gene Therapy 1 . 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 
See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors, 
employable in this invention include those described in the above referenced documents and in 

20 W094/12649, WO93/03769, W093/19191 , W094/28938, W095/1 1984, WO95/00655, 
WO95/2707 1 , W095/29993, W095/3467 1 , WO96/05320, WO94/08026, W094/1 1 506, 
WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 
Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 

25 Hum. Gene Then 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 
vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 
which the native D-sequences are modified by substitution of nucleotides, such that at least 5 
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native nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 
18 native nucleotides, most preferably 10 native nucleotides are retained and the remaining 
nucleotides of the D-sequence are deleted or replaced with non-native nucleotides. The native 
D-sequences of the AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in 
5 each AAV inverted terminal repeat (ie. there is one sequence at each end) which are not involved 
in HP formation. The non-native replacement nucleotide may be any nucleotide other than the 
nucleotide found in the native D-sequence in the same position. Other employable exemplary AAV 
vectors are pWP-19, pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. 
Another example of such an AAV vector is psub201 (see Samulski (1987) 7. Virol. 61:3096). 

10 Another exemplary AAV vector is the Double-D ITR vector. Construction of the Double-D ITR 
vector is disclosed in US Patent 5,478,745. Still other vectors are those disclosed in Carter US 
Patent 4,797,368 and Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin 
W094/288157. Yet a further example of an AAV vector employable in this invention is 
SSV9AFABTKneo, which contains the AFP enhancer and albumin promoter and directs 

15 expression predominantly in the liver. Its structure and construction are disclosed in Su (1996) 
Human Gene Therapy 7:463-470. Additional AAV gene therapy vectors are described in US 
5,354,678, US 5,173,414, US 5,139,941, and US 5,252,479. 

The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 

20 polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1988) Science 241:1667-1669 and in WO90/09441 
and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3:11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 

25 with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
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VR-1249; ATCC VR-532), and those described in US patents" 5,091,309, 5,217,879, and 
WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
are employable. Such alpha viruses may be obtained from depositories or collections such as the 
5 ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 
08/679640). 

DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
10 expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, 
for example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) 
J. Biol. Standardization 1:115; rhinovirus, for example ATCC VR-1110 and those described in 

15 Arnold (1990) J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for 
example ATCC VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl 
Acad Sci 86:317; Flexner (1989) Ann NY Acad Sci 569:86, Flexner (1990) Vaccine 8:17; in US 
4,603,1 12 and US 4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those 
described in Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73:1533; influenza 

20 virus, for example ATCC VR-797 and recombinant influenza viruses made employing reverse 
genetics techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 
87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 and Luytjes (1989) Cell 59:1 10, (see 
also McMichael (1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 
277:108); human immunodeficiency virus as described in EP-0386882 and in Buchschacher (1992) 

25 /. Virol. 66:2731; measles virus, for example ATCC VR-67 and VR-1 247 and those described in 
EP-0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 
and ATCC VR-1 240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for 
example ATCC VR-64 and ATCC VR-1 241; Fort Morgan Virus, for example ATCC VR-924; 
Getah virus, for example ATCC VR-369 and ATCC VR-1 243; Kyzylagach virus, for example 
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ATCC VR-927; Mayaro virus, for example ATCC VR-66; Mucambo virus, for example ATCC 
VR-580 and ATCC VR-1244; Ndumu virus, for example ATCC VR-371; Pixuna virus, for 
example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example ATCC VR-925; Triniti 
virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; Whataroa virus, for 
5 example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; O'Nyong virus, Eastern 
encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western encephalitis virus, for 
example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, 
for example ATCC VR-740 and those described in Hamre (1 966) Proc Soc Exp Biol Med 1 21 : 1 90. 

Delivery of the compositions of this invention into cells is not limited to the above mentioned viral 
10 vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene 
Ther 3:147-154 ligand linked DNA, for example see Wu (1989) / Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
15 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in WO92/11033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:241 1-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

20 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules 
such as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell 
targeting ligands such as asialoorosomucoid, as described in Wu & Wu (1987) 7. Biol Chem. 

25 262:4429-4432, insulin as described in Hucked (1990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described in 
WO 90/11092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
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beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by 
the beads. The method may be improved further by treatment of the beads to increase 
hydrophobicity and thereby facilitate disruption of the endosome and release of the DNA into the 
cytoplasm. 

5 Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be 
incubated with synthetic gene transfer molecules such as polymeric DNA-binding cations like 

10 polylysine, protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, 
insulin, galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to 
encapsulate DNA comprising the gene under the control of a variety of tissue-specific or 
ubiquitously-active promoters. Further non-viral delivery suitable for use includes mechanical 
delivery systems such as the approach described in Woffendin et al (1994) Proc. Natl Acad. Sci. 

15 USA 91(24): 1 1581-11585. Moreover, the coding sequence and the product of expression of such 
can be delivered through deposition of photopolymerized hydrogel materials. Other conventional 
methods for gene delivery that can be used for delivery of the coding sequence include, for 
example, use of hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing 
radiation for activating transferred gene, as described in US 5,206,152 and W092/1 1033 

20 Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600:1; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

25 A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
be from about 0.01 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in 
the individual to which it is administered. 
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Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
5 can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
10 transcutaneous applications {eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
in the art and described in eg. W093/14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
15 cells, or tumor cells. 

Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
20 known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
additional agents can be used with polynucleotide and/or polypeptide compositions. 

A.Polvpeptides 

25 One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
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factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 
other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 

5 B.Hormones, Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

C. Polvalkvlenes, Polysaccharides, etc. 

Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
10 preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

D. Lipids, and Liposomes 

The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in 
15 liposomes prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
but will generally be around 1:1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
20 Biophys. Acta. 1097:1-17; Straubinger (1983) Meth Enzymol 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
mediate intracellular, delivery of plasmid DNA (Feigner (1987) Proc. Natl Acad. ScL USA 
84:7413-7416); mRNA (Malone (1989) Proc. Natl Acad. ScL USA 86:6077-6081); and purified 
25 transcription factors (Debs (1990) 7. Biol Chem. 265:10189-10192), in functional form. 
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Cationic liposomes are readily available. For example, 

N[l-2,3-dioleyloxy)propyl]-N,N,N-triethyl ammonium (DOTMA) liposomes are available under 
the trademark Lipofectin, from GIBCO BRL, Grand Island, NY. (See, also, Feigner supra). Other 
commercially available liposomes include transfectace (DDAB/DOPE) and DOTAP/DOPE 
5 (Boerhinger). Other cationic liposomes can be prepared from readily available materials using 
techniques well known in the art. See, eg. Szoka (1978) Proc. Natl Acad. ScL USA 75:4194-4198; 
WO90/11092 for a description of the synthesis of DOTAP 
(1 ,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 

Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials 
include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline 
(DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), 
among others. These materials can also be mixed with the DOTMA and DOTAP starting materials 
in appropriate ratios. Methods for making liposomes using these materials are well known in the 
art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 
using methods known in the art. See eg. Straubinger (1983) Meth. Immunol 101:512-527; Szoka 
(1978) Proc. Natl Acad. ScL USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
20 394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:629; 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl Acad. ScL USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. ScL USA 76:145; Fraley (1980) J. Biol 
Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl. Acad. ScL USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 

25 E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. 
Mutants, fragments, or fusions of these proteins can also be used. Also, modifications of naturally 
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occurring lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the 
delivery of polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are 
including with the polynucleotide to be delivered, no other targeting ligand is included in the 
composition. 

5 Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
identified. At least two of these contain several proteins, designated by Roman numerals, AI, All, 
AIV; CI, CII, Cm. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
10 chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and E 
apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and HDL 
comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
Annu Rev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
15 261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 
65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
20 naturally occurring lipoproteins can be found, for example, in Meth. Enzymol 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 
25 Such methods are described in Meth, Enzymol (supra); Pitas (1980) 7. Biochem, 255:5454-5460 
and Mahey (1979) J Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for 
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example, Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys 
Acta 30: 443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 
Techniologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al PCT/US97/ 14465. 

5 F.Polvcationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
10 location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 

The following are examples of useful polypeptides as polycationic agents: polylysine, 
polyarginine, polyornithine, and protamine. Other examples include histones, protamines, human 
serum albumin, DNA binding proteins, non-histone chromosomal proteins, coat proteins from 
15 DNA viruses, such as (X174, transcriptional factors also contain domains that bind DNA and 
therefore may be useful as nucleic aid condensing agents. Briefly, transcriptional factors such as 
C/CEBP, c-jun, c-fos, AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID 
contain basic domains that bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

20 The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
25 combined with polynucleotides/polypeptides. 
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Immunodiagnostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
5 Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 
samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and a 
variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
10 or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 
molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
15 by packaging the appropriate materials, including the compositions of the invention, in suitable 
containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
20 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in 

solution. Then, the two sequences will be placed in contact with one another under conditions that 

favor hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; 

reaction temperature; time of hybridization; agitation; agents to block the non-specific attachment 

of the liquid phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration 
25 of the sequences; use of compounds to increase the rate of association of sequences (dextran sulfate 

or polyethylene glycol); and the stringency of the washing conditions following hybridization. See 

Sambrook et al [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 
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"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
concentration should be chosen that is approximately 120 to 200DC below the calculated Tm of 
the hybrid under study. The temperature and salt conditions can often be determined empirically in 
5 preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al at page 9.50. 

Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to l[xg for a 
plasmid or phage digest to 10" 9 to 10~ 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 
exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 1 
hour starting with 1 \ig of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with a 
probe of 10 8 cpm/fxg. For a single-copy mammalian gene a conservative approach would start with 
10 [xg of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 
using a probe of greater than 10 8 cpm/^tg, resulting in an exposure time of -24 hours. 

Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
20 and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
the ionic strength and formamide content of the hybridization buffer. The effects of all of these 
factors can be approximated by a single equation: 

25 Tm= 81 + 16.6(logi 0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-l .5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal Biochem. 138: 267-284). 
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In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
5 nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 
homologous with the immobilized fragment (as is frequently the case in gene family and 
interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
10 increased with decreasing salt concentrations. 

In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for a 
probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the 
15 probe and the target fragment are not known, the simplest approach is to start with both 
hybridization and wash conditions which are nonstringent. If non-specific bands or high 
background are observed after autoradiography, the filter can be washed at high stringency and 
reexposed. If the time required for exposure makes this approach impractical, several hybridization 
and/or washing stringencies should be tested in parallel. 

20 Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
which is stable enough to be detected. 

25 The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
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complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and so 
a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
5 can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 
include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
10 or longer sequences can be interspersed into the probe, provided that the probe sequence has 
sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
15 complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 

Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al [J. 
20 Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al. [Proc. Natl Acad. Sci. USA (1983) 
80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
25 in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
al. (1993) TIBTECH 11:384-386]. 
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Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al [Meth. EnzymoL 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
5 that does not hybridize to the sequence of the amplification target (or its complement) to aid with 
duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
10 generated by the polymerase, they can be detected by more traditional methods, such as Southern 
blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook et 
al [supra], mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
15 and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed to 
remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
ORFs 37 (Fig. 1 A-1E), 5 (Fig. 2A-2B), 2 (Fig. 3A-3D), 15 (Fig. 4A-4C), 22 (Fig. 5A-5C), 28 (Fig. 
6A-6B), 32 (Fig. 7A-7B), 4 (Fig. 8A-8F), 61 (Fig. 9), 76 (Fig. 10A-10C), 89 (Fig. 11), 97 (Fig. 
12A-12E), 106 (Fig. 13A-7C), 138 (Fig. 14A-B), 23 (Fig. 15A-15C), 25 (Fig. 16A-16E), 27 (Fig. 
17A-17B, 79 (Fig. 18A-18B), 85 (Fig. 19A-19D) and 132 (Fig. 20A-20C). Ml and M2 are 

25 molecular weight markers. Arrows indicate the position of the main recombinant product or, in 
Western blots, the position of the main N. meningitidis immunoreactive band. TP indicates 
N. meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
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shows GST control data; a circle ( ) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 
AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et al 
(1989) J. Immunol 143:3007; Roberts et al (1996) AIDS Res Hum Retrovir 12:593; Quakyi et al 
5 (1992) Scand J Immunol suppl.l 1:9) and is available in the Protean package of DNASTAR, Inc. 
(1228 South Park Street, Madison, Wisconsin 53715 USA). 

Figure 21 shows an alignment comparison of amino acid sequences for ORF 4 for several strains 
of Neisseria. Dark shading indicates regions of homology, and gray shading indicates the 
conservation of amino acids with similar characteristics. The Figure demonstrates a high degree of 
10 conservation among the various strains, further confirming its utility as an antigen for both 
vaccines and diagnostics. 

EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N. meningitidis, along 
with their putative translation products, and also those of N. gonorrhoeae. Not all of the nucleic 
15 acid sequences are complete ie. they encode less than the full-length wild-type protein. 

The examples are generally in the following format: 

• a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
N. gonorrhoeae 

• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 
The examples typically include details of sequence identity between species and strains. Proteins 
that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 



20 



25 
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known function is widely used as a guide for the assignment of putative protein function to a new 
sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
5 Altschul et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
10 implemented at http://www.genome.ou.edu/gono_blast.html. The FASTA algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

Dots within nucleotide sequences {eg. position 495 in SEQ ID NO: 1 1) represent nucleotides which 
have been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
underlined nucleotides were removed. Lower case letters {eg. position 496 in SEQ ID NO: 11) 
15 represent ambiguities which arose during alignment of independent sequencing reactions (some of 
the nucleotide sequences in the examples are derived from combining the results of two or more 
experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of 
hydrophobic domains using an algorithm based on the statistical studies of Esposti et al [Critical 
20 evaluation of the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219]. These 
domains represent potential transmembrane regions or hydrophobic leader sequences. 

Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences in 
25 the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional domains 
were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 



CHIR-0160 (356.001) PATENT 

-54- 

Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient 
sera by immunoblot. A positive reaction between the protein and patient serum indicates that the 
patient has previously mounted an immune response to the protein in question ie. the protein is an 
5 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
{eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

10 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, 

15 pH8). After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution 
(50mM NaCl, 1% Na-Sarkosyl, 50|J,g/ml Proteinase K), and the suspension was incubated at 37°C 
for 2 hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

20 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 

B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
25 sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
signal peptides were omitted, by deducing the 5'-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 
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For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRl-Nhel, depending on the gene's own restriction pattern); the 3' primers 
included a Xhol restriction site. This procedure was established in order to direct the cloning of 
each amplification product (corresponding to each ORF) into two different expression systems: 
5 pGEX-KG (using either BamHl-Xhol or EcoRl-Xhol), and pET21b+ (using either Ndel-Xhol or 
Nhel-XhoT). 

5 '-end primer tail: CGC GGAT CC CATATG (SEQ ID NO: 1 099) (BamHI-Ndel ) 

CGCGGATCCGCTAGC (SEQIDNO: 1100) {BamHl-Nhel) 
CCG GAATTC T AGCTAGC (SEQ ID NO: 1 101) (EcoRl-Nhel) 
10 3 '-end primer tail: CCCG CTCGAG (SEQ ID NO: 1 102) (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
the same 3' Xhol primer was used as before: 

5'-end primer tail: GGAATTC CATATG GCCATGG (SEQ ID NO: 1 103) (NdeT) 

15 5 '-end primer tail: CG GGATCC (BamHT) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino-terminus His-tag 
fusion. In this particular case, the predicted signal peptide was included in the final product. Nhel- 
BamHl restriction sites were incorporated using primers: 

5'-end primer tail: GATCAGCJAGCCATATG (SEQ ID NO: 1104) (Nhel) 

20 3 '-end primer tail: CG GGATCC (BamHl) 

As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 



25 T m = 4 (G+C)+ 2 (A+T) 



(tail excluded) 
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T m = 64.9 + 0.41 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
50-55°C for the hybridising region alone. 



Table I shows the forward and reverse primers used for each amplification. In certain cases, it will 
5 be noted that the sequence of the primer does not exactly match the sequence in the ORF. When 
initial amplifications were performed, the complete 5' and/or 3' sequence was not known for some 
meningococcal ORFs, although the corresponding sequences had been identified in gonococcus. 
For amplification, the gonococcal sequences could thus be used as the basis for primer design, 
altered to take account of codon preference. In particular, the following codons were changed: 
10 ATA->ATT; TCG->TCT; CAG— >CAA; AAG->AAA; GAG->GAA; CGA->CGC; CGG->CGC; 
GGG-»GGC. Italicised nucleotides in Table I indicate such a change. It will be appreciated that, 
once the complete sequence has been identified, this approach is generally no longer necessary. 



TABLE I - PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF1 


Forward 
Reverse 


CGCGGATCCGCTAGC - GGACACACTTATTTCGG (SEQ ID 
NO: 924) 

CCCGCTCGAG - CCAGCGGTAGCCTAATT (SEQ ID NO: 
925) 


BamHI-Nhel 
Xhol 


ORF 2 


Forward 
Reverse 


GCGGATCCCATATG - TTTGATTTCGGTTTGGG (SEQ ID 
NO: 926) 

CCCGCTCGAG - GACGGCATAACGGCG (SEQ ID NO: 
927) 


BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG -TTTGATTTCGGTTTGGG ( SEQ ID 
NO: 928) 

CCCGCTCGAG - TGATTTACGGACGCGCA (SEQ ID NO: 
929) 


BamHI-Ndel 
Xhol 


ORF 4 


Forward 
Reverse 


GCGGATCCCATATG - TGCGGAGGTCAAAAAGAC (SEQ ID 
NO: 930) 

CCCGCTCGAG -TTTGGCTGCGCCTTC (SEQ ID NO: 
931) 


BamHI-Ndel 
Xhol 


ORF 5 


Forward 
Forward 


GGAATTCCATATGGCCATGG- TGGAAGGCGCACAACC ( SEQ 
ID NO: 932) 

CGGGATCC - ATGGAAGGCGCACAAC (SEQ ID NO: 
933) 


Ndel-Ncol 
BamHI 
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Reverse 


CCCGCTCGAG-GACTGTGCAAAAACGG (SEQ ID NO: 
934) 


Xhol 


ORF 6 


Forward 

Reverse 


CGCGGATCCCATATG -ACCCGTCAATCTCTGCA (SEQ ID 
NO: 935) 

CCCGCTCGAG - TGCGCCGAACACTTTC (SEQ ID NO: 
936) — 


BamHI-Ndel 
Xhol 


ORF 7 


Forward 
Reverse 


CGCGGATCCGCTAGC - GCGCTGCTTTTTGTTCC (SEQ ID 
NO: 937) 

CCCGCTCGAG - TTTCAAAATATATTTGCGGA (SEQ ID NO: 
938) 


BamHI-Nhel 
Xhol 


ORF 8 


Forward 
Reverse 


GCGGATCCCATATG - GCTCAACTGCTTCGTAC (SEQ ID 
NO: 939) 

CCCGCTCGAG -AGCAGGCTTTGGCGC (SEQ ID NO: 940) 


BamHI-Ndel 
Xhol 




i \ji w cxi \j 

Reverse 


CGCGGATCCCATATG - CCGAAGGAAGTCGGAAA (SEQ ID 
NO: 941) 

CCCGCTCGAG - TTTCCGAGGTTTTCGGG (SEQ ID NO: 
942) 


BamHI-Ndel 
Xhol 


ORF 10 


Forward 
Reverse 


GCGGATCCCATATG -GACACAAAAGAAATCCTC (SEQ ID 
NO: 94 3) 

CCCGCTCGAG- TAATGGGAAACCTTGTTTT (SEQ ID NO: 
944) 


BamHI-Ndel 
Xhol 


ORF 11 


Forward 
Reverse 


GCGGATCCCATATG -GCGGTCAACCTCTACG (SEQ ID NO: 
945) 

CCCGCTCGAG -GGAAACGACTTCGCC (SEQ ID NO: 946) 


BamHI-Ndel 
Xhol 


ORF 13 


Forward 
Reverse 


CGCGGATCCCATATG- GCTCTGCTTTCCGCGC (SEQ ID 
NO: 947) 

CCCGCTCGAG -AGGGTGTGTGATAATAAG (SEQ ID NO: 
948) 


BamHI-Ndel 
Xhol 


ORF 15 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG - GCGGGACACTGACAG (SEQ 
ID NO: 949). 

CGGGATCC - TGCGGGACACTGACAGG (SEQ ID NO: 950) 
CCCGCTCGAG- AGGTTGGCCTTGTCTATG (SEQ ID NO: 
951) 


Ndel-Ncol 

BamHI 
Xhol 


ORF 17 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG - TTGCCGGCCTGTTCG (SEQ 
T D NO * 9 S 2 ) 

CGGGATCC -ATTGCCGGCCTGTTCG (SEQ ID NO: 953) 
CCCGCTCGAG -AAGCAGGTTGTACAGC (SEQ ID NO: 
954) 


Ndel-Ncol 

BamHI 
Xhol 


ORF 18 


Forward 
Reverse 


GCGGATCCCATATG- ATTTTGCTGCATTTGGAT (SEQ ID 
NO: 955) 

CCCGCTCGAG- TCTTCCAATTTCTGAAAGC (SEQ ID NO: 
956) 


BamHI-Ndel 
Xhol 
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ORF19 


Forward 


GGAATTCCATATGGCCATGG - TCGCCAGTGTTTTTACC 


Ndel-Ncol 






(SEQ ID NO: yb/; 






Forward 


CGGGATCC- TTCGCCAGTGTTTTTACCG (SEQ ID NO: 


BamHI 






958) 






Reverse 


CCCGCTCGAG -GGTGTTTTTGAAGCTGCC (SEQ ID NO: 


Xhol 






959) 




(Hi Iff 


r ui w di u 


GGAATTCCATATGGCCATGG - TCGGCGCGGGTATG (SEQ 


NdeT-NcoT 






ID NO: 960) 






Forward 


CGGGATCC -TTCGGCGCGGGTATG (SEQ ID NO: 961) 


BamHI 




Reverse 


CCCGCTCGAG - CGGCGAGCGAGAGCA (SEQ ID NO: 962) 


Xhol 


ORF22 


Forward 


GGAATTCCATATGGCCATGG - TGATTAAAATCAAAAAAGGTCT 


Ndel-Ncol 






(SEQ ID NO: 963) 






Forward 


CGGGATCC - ATGATTAAAATCAAAAAAGGTCTAAACC ( SEQ 


BamHI 






ID NO: 964) 






Reverse 




V|, n I 

Anoi 






yob/ 




ORF 23 


Forward 


CGCGGATCCCATATG - GATGTTTCTGTTTCAGAC (SEQ ID 


BamHI-Ndel 






NO: 966) 






Reverse 


CCCGCTCGAG- TTTAAACCGATAGGTAAACG (SEQ ID NO: 


Xhol 






967) 




^"VTTll TTT1 ^ A 

ORF 24 


Forward 


r*r* t\ 7v«T , T , r i r i ii r PZi TnnnnnTnn -TPaTP.ppf2P.Zi a zi tp,ptc^ 


in del- in col 






(SEQ ID NO: 968) 






Forward 


CGGGATCC -ATGATGCCGGAAATGGTG (SEQ ID NO: 


BamHI 






969) 






ivc verse 


CCCGCTCGAG- TGTCAGCGTGGCGCA (SEQ ID NO: 970) 


XhnT 


ORF 25 


Forward 


GCGGATCCCATATG - TATCGCAAACTGATTGC (SEQ ID 


BamHI-Ndel 






NO: 971) 






Reverse 


ppppptppzip.- nTpnznv^nziATAnppn f Qpn td no • 


AllOl 






y /z ; 




ORF 26 


Forward 


GCGGATCCCATATG - CAGCTGATCGACTATTC (SEQ ID 


BamHI-Ndel 






NO: 973) 






Reverse 


CCCGCTCGAG - GACATCGGCGCGTTTT (SEQ ID NO: 


Xhol 






974) 




ORF 27 


Forward 


GGAATTCCATATGGCCATGG- AGACCTATTCTGTTTA ( SEQ 


Ndel-Ncol 






ID NO* 1168) 






Forward 


CGGGATCC- CAGACCTATTCTGTTTATTTTAATC (SEQ ID 


BamHI 






NO: 975) 






Reverse 


CCCGCTCGAG- GGGTTCGATTAAATAACCAT (SEQ ID NO: 


Xhol 






976) 




ORF 28 


Forward 


GGAATTCCATATGGCCATGG -ACGGCTGTACGTTGATGT 


Ndel-Ncol 






(SEQ ID NO: 977) 






Forward 


CGGGATCC- AACGGCTGTACGTTGATG (SEQ ID NO: 


BamHI 
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Reverse 

J.XV T VI kJV 


978) 

CCCGCTCGAG- TTTGTCAGAGGAATTCGCG (SEQ ID NO: 
979) 


Xhol 


ORF29 


Forward 
Forward 
Reverse 


GCGGATCCCATATG -AACGGTTTGGATGCCCG (SEQ ID 
NO: 980) 

CGCGGATCCGCTAGC - AACGGTTTGGATGCCCG (SEQ ID 
NO: 981) 

CCCGCTCGAG - TTTGTCTAAGTTCCTGATATG (SEQ ID 
NO: 982) 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 32 


Forward 

X KJl W Cll LI 

Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG (SEQ ID 
NO: 983) 

CCCGCTCGAG- GCGTATTTTTTGATGCTTTG (SEQ ID NO: 
984) 


BamHI-Ndel 
Xhol 


ORF33 


Forward 

iVt' V vvl 3v 


GCGGATCCCATATG -ATTGATAGGGATCGTATG (SEQ ID 
NO: 985) 

CCCGCTCGAG- TTGATCTTTCAAACGGCC (SEQ ID NO: 
986) 


BamHI-Ndel 
Xhol 


ORF35 


Forward 

Poru/firrl 

Reverse 


GCGGATCCCATATG- TTCAGAGCTCAGCTT (SEQ ID NO: 
987) 

CGCGGATCCGCTAGC -TTCAGAGCTCAGCTT (SEQ ID NO: 
988) 

CCCGCTCGAG- AAACAGCCATTTGAGCGA (SEQ ID NO: 
989) 


BamHI-Ndel 
BamHI-Nhel 
Xhol 


ORF 37 


1 WI WulU 

Reverse 


GCGGATCCCATATG - GATGACGTATCGGATTTT (SEQ ID 
NO: 990) 

CCCGCTCGAG -ATAGCCCGCTTTCAGG (SEQ ID NO: 
991) 


BamHI-Ndel 
Xhol 


ORF 58 


Forward 
Reverse 


CGCGGATCCGCTAGC -TCCGAACGCGAGTGGAT (SEQ ID 
NO: 992) 

CCCGCTCGAG- AGCATTGTCCAAGGGGAC (SEQ ID NO: 
993) 


BamHI-Nhel 
Xhol 


ORF 65 


Forward 


GGAATTCCATATGGCCATGG - TGCTGTATCTGAATCAAG 
(SEQ ID NO: 994) 


Ndel-Ncol 




Forward 

RpVfTSP 

l\v V Wl uv 


CGGGATCC - TTGCTGTATCTGAATCAAGG (SEQ ID NO: 
995) 

CCCGCTCGAG- CCGCATCGGCAGACA (SEQ ID NO: 996) 


BamHI 
Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG -TACGCATTTACCGCCG (SEQ ID NO: 
997) 

CCCGCTCGAG - TGGATTTTGCAGAGATGG (SEQ ID NO: 
998) 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 


CGCGGATCCCATATG - AATGCAGTAAAAATATCTGA (SEQ 
ID NO: 999) 


BamHI-Ndel 



r 



CHIR-01 60 (356.001) PATENT 

-60- 





Reverse 


CCCGCTCGAG - GCCTGAGACCTTTGCAA (SEQ ID NO: 
1000) 


Xhol 


ORF 73 


Forward 
Reverse 


GCGGATCCCATATG-AGATTTTTCGGTATCGG (SEQ ID 
NO: 1001) 

CCCGCTCGAG - TTCATCTTTTTCATGTTCG (SEQ ID NO: 
1002) 


BamHI-Ndel 
Xhol 


ORF75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC (SEQ ID 
NO: 1003) 

CCCGCTCGAG - TTTGTTTTTGCAAGACAG (SEQ ID NO: 
1004) 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GATCAGCTAGCCATATG - AAACAGAAAAAAACCGC (SEQ ID 
NO: 1005) 

CGGGATCC - TTACGGTTTGACACCGTT (SEQ ID NO: 
1006) 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGATCCCATATG-GTTTCCGCCGCCG (SEQ ID NO: 
1007) 

CCCGCTCGAG - GTGCTGATGCGCTTCG (SEQ ID NO: 
1008) 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG- AAAACCCTGCTGCTGC (SEQ ID NO: 
1009) 

CCCGCTCGAG -GCCGCCTTTGCGGC (SEQ ID NO: 1010) 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG (SEQ ID NO: 
1011) 

CCCGCTCGAG -GTTTGCCGATCCGACCA (SEQ ID NO: 
1012) 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA (SEQ ID 
NO: 1013) 

CCCGCTCGAG- TCGGCGCGGCGGGC (SEQ ID NO: 1014) 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG - CCATACCTTCTTATCA ( SEQ 
ID NO: 1015) 

CGGGATCC -GCCATACCTTCTTATCAGAG (SEQ ID NO: 
1016) 

CCCGCTCGAG- TTTTTTGCGATTAGAAAAAGC (SEQ ID 
NO: 1017) 


Ndel-Ncol 

BamHI 

Xhol 


ORF 97 


Forward 
Reverse 


GCGGATCCCAj.A±G-UA1L.L1ULLAIj1.vjAA(_ (sty ID JMU . 
1018) 

CCCGCTCGAG- TTCGCCTACGGTTTTTTG (SEQ ID NO: 
1019) 


Bamnl-Ndel 
Xhol 


ORF 98 


Forward 
Reverse 


GCGGATCCCATATG- ACGGTAACTGCGG (SEQ ID NO: 
1020) 

CCCGCTCGAG -TTGTTGTTCGGGCAAATC (SEQ ID NO: 
1021) 


BamHI-Ndel 
Xhol 



I 
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ORF 100 


Forward 
Reverse 


GCGGATCCCATATG - TCGGGCATTTACACCG (SEQ ID NO: 
1022) 

CCCGCTCGAG-ACGGGTTTCGGCGGAA (SEQ ID NO: 
1023) 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 

A. V-/ 1 TTU1 VI 

Reverse 


GCGGATCCCATATG- ATTTATCAAAGAAACCTC (SEQ ID 
NO: 1024) 

CCCGCTCGAG - TTTTCCGCCTTTCAATGT (SEQ ID NO: 
1025) 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG -GCAGGGCTGTTTTACC (SEQ ID NO: 
1026) 

CCCGCTCGAG- AAACGGTTTGAACACGAC (SEQ ID NO: 
1027) 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 

J. \J 1 W 111 VJ 

Reverse 


GCGGATCCCATATG- AACCACGACATCAC (SEQ ID NO: 
1028) 

CCCGCTCGAG - CAGCCACAGGACGGC (SEQ ID NO: 
1029) 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 

J. \Jl TV Ul \j 

Reverse 


GCGGATCCCATATG -ACGTGGGGAACGC (SEQ ID NO: 
1030) 

CCCGCTCGAG - GCGGCGTTTGAACGGC (SEQ ID NO: 
1031) 


BamHI-Ndel 
Xhol 


ORF 1 05 


Forward 
Reverse 


GCGGATCCCATATG -AC CAAATTTCAAACCCCTC (SEQ ID 
NO: 1032) 

CCCGCTCGAG -TAAACGAATGCCGTCCAG (SEQ ID NO: 
1033) 


BamHI-Ndel 
Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG- AGGATAACCGACGGCG (SEQ ID NO: 
1034) 

CCCGCTCGAG- TTTGTTCCCGATGATGTT (SEQ ID NO: 
1035) 


BamHI-Ndel 
Xhol 




Reverse 


GCGGATCCCATATG - GAAGATTTATATATAATACTCG ( SEQ 
ID NO: 1036) 

CCCGCTCGAG- ATCAGCTTCGAACCGAAG (SEQ ID NO: 
1037) 


BamHI-Ndel 
Xhol 


/OvTTD TT7 1 Tl 11 /> 


Forward 
Reverse 


AAAfiAATTr- ATCrAfiTAAATCCCGTAGATCTCCC ( SEO ID 
NO: 1038) 

AAACTGCAG-GGAAAACCACATCCGCACTCTGCC (SEQ ID 
NO: 1039) 


PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC - GCACCGCAAAAGGCAAAAACCGCA (SEQ ID 
NO: 1040) 

AAACTGCAG - TCTGCGCGT 3TTCGGGCAGGGTGG (SEQ ID 
NO: 1041) 


EcoRI 
PstI 


ORF113 


Forward 


AAAGAATTC -ATGAACAAAACCCTCTATCGTGTGATTTTCAAC 
CG (SEQ ID NO: 1042) 


EcoRI 



CHIR-0 160 (356.001) PATENT 

-62- 





Reverse 


AAACTGCAG - TTACGAATGCCTGCTTGCTCGACCGTACTG 
(SEQ ID NO: 1043) 


PstI 


ORF115 


Forward 
Reverse 


AAAGAATTC - TTGCTTGTGCAAACAGAAAAAGACGG ( SEQ 
ID NO: 1044) 
AAAAAAGTCGAC - 

CTATTTTTTAGGGGC 7TTTGC 7TGTTTGAAAAGCCTGCC (SEQ ID 
NO: 1045) 


EcoRI 
Sail 


QRF119 


Forward 
Reverse 


AAAGAATTC - TACAACATGTATCAGGAAAACCAATACCG 
(SEQ ID NO: 1046) 

AAACTGCAG - TTATGAAAACAGGCGCAGGGCGGTTTTGCC 
(SEQ ID NO: 1047) 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC -GCAAGGCTACCCCAATCCGCCGTG (SEQ ID 
NO: 1048) 

AAACTGCAG -CGGTTTGGCTGCCTGGCCGTTGAT (SEQ ID 
NO: 1049) 


EcoRI 
PstI 


ORF121 


Forward 
Reverse 


AAAGAATTC -GCCTTGGTCTGGCTGGTTTTCGC (SEQ ID 
NO: 1050) 

AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC 
(SEQ ID NO: 1051) 


EcoRI 
PstI 


ORF122. 


Forward 
Reverse 


AAAAAAGTCGAC -ATGTC TTACCGCGCAAGCAGTTC TCC 
(SEQ ID NO: 1052) 

AAACTGCAG - TCAGGAACACAAACGATGACGAATATCCGTATC 
(SEQ ID NO: 1053). 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC -GCGCTGTTTTTTGCGGCGGCGTAT (SEQ ID 
NO: 1054) 

AAACTGCAG -CGCCGTTTCAAGACGAAAAAGTCG (SEQ ID 
NO: 1055) 


EcoRI 
PstI 


OKI 1 


Forward 
Reverse 


AAAnAATTP-fiPGGAAACGGTCGAAG (SEO ID NO- 
1056) 

AAACTGCAG - TTAATCTTGTCTTCCGATATAC (SEQ ID 
NO: 1057) 


C rn DT 
JJ/CUlVl 

PstI 


OKJF 1Z / 


Forward 
Reverse 


AAAGAATTC - ATGACTGATAATCGGGGGTTTACG ( SEQ I D 
NO: 1058) 

AAAAAAGTCGAC - CTTAAGTAACTTGCAGTCCTTATC ( SEQ 
ID NO: 1059) 


Sail 


ORF128 


Forward 
Reverse 


AAAGAATTC -ATGCAAGCTGTCCG CTACAGGCC (SEQ ID 
NO : 1060) 

AAACTGCAG - CTA 7TGCAATGCGCCGCCGCGGGAATG TTTGAGCAGGC 
G (SEQ ID NO: 1061) 


EcoRI 
PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC - ATGGATTTTCGTTTTGACATTATTTACGAATAC 
CG (SEQ ID NO: 1062) 

AAACTGCAG - TTATTTTTTGATGAAATTTTGGGGCGG (SEQ 
ID NO: 1063) 


EcoRI 
PstI 
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ORF130 


Forward 
Reverse 


AAAGAATTC - GCAGTACTTGCCAT TCTCGGTGCG (SEQ ID 
NO: 1064) 

AAACTGCAG- CTCCGGATCGTCTGTAAACGCATT (SEQ ID 
NO: 1065) 


EcoRI 
PstI 


ORF 131 


Forward 
Reverse 


GCGGATCCCATATG - GAAATTCGGGCAATAAAAT (SEQ ID 
NO: 1066) 

CCCGCTCGAG - CCAGCGGACGCGTTC (SEQ ID NO: 
1067) 


BamHI-Ndel 
Xhol 


ORF 132 


Forward 
Reverse 


GCGGATCCCATATG -AAAGAAGCGGGGTTTG (SEQ ID NO: 
1068) 

CCCGCTCGAG -CCAATCTGCCAGCCGT (SEQ ID NO: 
1069) 


BamHI-Ndel 
Xhol 


ORF 133 


Forward 
Reverse 


CGCGGATCCCATATG- GAAGATGCAGGGCGCG (SEQ ID 
NO: 1070) 

CCCGCTCGAG- AAACTTGTAGCTCATCGT (SEQ ID NO: 
1071) 


BamHI-Ndel 
Xhol 


ORF 134 


Forward 

1. \Ji TV Ul U 

Reverse 


GCGGATCCCATATG - TCTGTGCAAGCAGTATTG (SEQ ID 
NO: 1072) 

CCCGCTCGAG- ATCCTGTGCCAATGCG (SEQ ID NO: 
1073) 


BamHI-Ndel 
Xhol 


ORF 135 

V/l\l A *J*J 


Forward 

X vl TV 111 VI 

Reverse 


GCGGATCCCATATG - CCGTCTGAAAAAGCTTT (SEQ ID 
NO: 1074) 

CCCGCTCGAG- AAATACCGCTGAGGATG (SEQ ID NO: 
1075) 


BamHI-Ndel 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC - ATGAAGCGGCGTATAGCC (SEQ ID 
NO: 1076) 

CCCGCTCGAG - TTCCGAATATTTGGAACTTTT (SEQ ID 
NO: 1077) 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG- GGCACGGCGGGAAATA (SEQ ID 
NO: 1078) 

CCCGCTCGAG -ATAACGGTATGCCGCC (SEQ ID NO: 
1079) 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG - TTTCGTTTACAATTCAGGC (SEQ ID 
NO: 1080) 

CCCGCTCGAG- CGGCGTTTTATAGCGG (SEQ ID NO: 
1081 ) 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG - GCTTTTTTGGCGGTAATG (SEQ ID 
NO: 1082) 

CCCGCTCGAG - TAACGTTTCCGTGCGTTT (SEQ ID NO: 
1083) 


BamHI-Ndel 
Xhol 


ORF 140 


Forward 


GCGGATCCCATATG - TTGCCCACAGGCAGC (SEQ ID NO: 
1084) 


BamHI-Ndel 
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JVC vci sc 


ORF 141 


Forward 




IVCVCl oC 


ORF 142 


Forward 




IvCVCi aC 


ORF 143 


Forward 




i\e verse 


ORF 144 


Forward 




Reverse 


ORF 147 


Forward 




Reverse 



CCCG CTCGAG - GACGATGGCAAACAGC (SEQ ID NO: 
1085) 

GC GGATCCCATATG - CCGTCTGAAGCAGTCT (SEQ ID NO: 
1086) 

CCCG CTCGAG - ATCTGTTGTTTTTAAAATATT (SEQ ID 
NO: 1087) 

GC GGATCCCATATG -GATAATTCTGGTAGTGAAG (SEQ ID 
NO: 1088) 

CCCG CTCGAG - AAACGTATAGCCTACCT (SEQ ID NO: 
1089) 

GC GGATCCCATATG -GATACCGCTTTGAACCT (SEQ ID 
NO: 1090) 

CCCG CTCGAG -AATGGCTTCCGCAATATG (SEQ ID NO: 
1091) 

GC GGATCCCATATG - AC CTTTTTACAACGTTTGC (SEQ ID 
NO: 1092) 

CCCG CTCGAG - AGATTGTTGTTGTTTTTTCG (SEQ ID NO: 
1093) 

GC GGATCCCATATG - TCTGTCTTTCAAACGGC (SEQ ID 
NO: 1094) 

CCCGCTCGAG - TTTGTTTTTGCAAGACAG (SEQ ID NO: 
1095) 



Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 

BamHI-Ndel 
Xhol 



10 



NB: 

- restriction sites are underlined 

- for ORFs 1 10-130, where the ORF itself carries an EcoRl site (eg. ORF 122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site (eg. 
ORFs 1 15 and 127), a Sail site was used in the reverse primer. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
in 2ml NH4OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
addition of 0.3M Na-Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either IOOjllI or 1ml of water. OD 2 6o was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-10pmol/|Lil. 



C) Amplification 

The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template in 
the presence of 20-40|iM of each oligo, 400-800^iM dNTPs solution, lx PCR buffer (including 
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1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of 10|il DMSO or 50jxl 2M betaine. 

After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
5 at 95°C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 
95°C 


30 seconds 
65-70°C 


30-60 seconds 
72°C 



10 

The elongation time varied according to the length of the ORF to be amplified. 

The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

15 The amplified DNA was either loaded directly on a 1% agarose gel or first precipitated with 
ethanol and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA 
fragment corresponding to the right size band was then eluted and purified from gel, using the 
Qiagen Gel Extraction Kit, following the instructions of the manufacturer. The final volume of the 
DNA fragment was 30jil or 50jxl of either water or lOmM Tris, pH 8.5. 
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D) Digestion of PCR fragments 

The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeVXhol or NhellXhol for cloning into pET-21b+ and further expression of the protein 
5 as a C-terminus His-tag fusion 

- BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 
protein as N-terminus GST fusion. 

- For ORF 76, Nhel/BamHl for cloning into pTRC-HisA vector and further expression of 
the protein as N-terminus His-tag fusion. 

10 - EcoRI/PstI, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 

the protein as N-terminus His-tag fusion 
Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40(il final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 

15 purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
50(il of either water or l OmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 
1 % agarose gel electrophoresis in the presence of titrated molecular weight marker. 

E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

10|ig plasmid was double-digested with 50 units of each restriction enzyme in 200|il reaction 
20 volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50|il of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 2 6o of the 
sample, and adjusted to 50|LLg/|il. l(il of plasmid was used for each cloning procedure. 

25 The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
vector pTRC99 (Pharmacia). 
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F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both 
pET22b and pGEX-KG. In a final volume of 20|il, a molar ratio of 3:1 fragment/vector was ligated 
using 0.5(il of NEB T4 DNA ligase (400 units/|il), in the presence of the buffer supplied by the 
5 manufacturer. The reaction was incubated at room temperature for 3 hours. In some experiments, 
ligation was performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's 
instructions. 

In order to introduce the recombinant plasmid in a suitable strain, 1 OOjllI E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
10 minutes, then, after adding 800(il LB broth, again at 37°C for 20 minutes. The cells were then 
centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 
200|il of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + lOO^ig/ml 

15 ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
Miniprep Kit, following the manufacturer's instructions, to a final volume of 30|LiL 5|il of each 
individual miniprep (approximately lg ) were digested with either NdeMXhol or BamHMXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 

20 positive clones was made on the base of the correct insert size. 

For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRl-Pstl cloning sites or, for ORFs 1 1 5 & 
127, EcoRl-Sall or, for ORF 122, Sall-Pstl. After cloning, the recombinant plasmids were 
introduced in the Exoli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
25 with 50|Lil/ml ampicillin. 
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G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
of the recombinant protein product. l|il of each construct was used to transform 30(0.1 of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as 
5 described above. In the case of the pGEX-His vector, the same E.coli strain (W31 10) was used for 
initial cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(lOO^ig/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (lOOjxg/ml) in 
100ml flasks, making sure that the OD 6 oo ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 

10 induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 

15 the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 

20 diluted 1:30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-37°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 

25 The supernatant was collected and mixed with 150(il Glutatione-Sepharose 4B resin (Pharmacia) 
(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 10 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 2 go of 0.02-0.06. The GST-fusion 
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protein was eluted by addition of 700|il cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 2 so was 0.1. 21jil of each fraction 
were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 116.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
5 (M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500(xl PBS pH 7.2]. 25^1 lysozyme (lOmg/ml) was added and the 

10 bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 
Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 
in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 

15 in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 
at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 113, 1 19 and 120 were found to be soluble in PBS, whereas 
ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

20 J) His-fusion large-scale purification. 

A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD 550 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
25 and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 
the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 
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The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 
2ml buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) 
5 and treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 
minutes. 

Supernatants were collected and mixed with 150(il Ni -resin (Pharmacia) (previously washed with 1 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
for 30 minutes. The sample was centrifuged at 700g for 5 minutes at 4°C. The resin was washed 
10 twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 28 o of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
15 phosphate buffer, pH 6.3) until the flow-through reached the O.D 28 o of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700jxl of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 2 8o was 0.1. 21|il of each 
fraction were loaded on a 12% SDS gel. 

20 K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20|xg/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
25 50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 



Protein (mg/ml) = (1.55 x OD 28 o) - (0.76 x OD 260 ) 
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L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending on 
5 the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

10 M) Mice immunisations 

20|ig of each purified protein were used to immunise mice intraperitoneally. In the case of ORFs 2, 
4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1,21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
15 were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 
protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

20 The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 6 20. The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 

25 supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. 100|il bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
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buffer (0.1% Tween-20 in PBS). 200^1 of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 
water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
three times with PBT. 200jil of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0.1% 
NaN 3 in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
5 washed three times with PBT. lOOjxl of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 
1:2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
37°C. Wells were washed three times with PBT buffer. 100|Lil of substrate buffer for HRP (25ml of 
citrate buffer pH5, lOmg of O-phenildiamine and 10|nl of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. 100|il H 2 S0 4 was added to each well and 
10 OD 490 was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 
pre-immune sera. 

O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 

15 inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 
glucose. Bacterial growth was monitored every 30 minutes by following OD 6 2o. The bacteria were 
let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 

20 buffer to reach OD 6 2o of 0.07. lOOjxl bacterial cells were added to each well of a Costar 96 well 
plate. 100|il of diluted (1:200) sera (in blocking buffer) were added to each well and plates 
incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200|Lil/well of blocking buffer in each well. 100|il of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 

25 incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 
washed by addition of 200|il/well of blocking buffer. The supernatant was aspirated and cells 
resuspended in 200^1/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
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threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 

P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria 
disrupted by sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were 
removed by centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered 
by centrifugation at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from 
the crude outer membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and 
incubated at room temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 
minutes to remove aggregates, and the supernatant further ultracentrifuged at 50000g for 75 
minutes to pellet the outer membranes. The outer membranes were resuspended in lOmM Tris- 
HCl, pH8 and the protein concentration measured by the Bio-Rad Protein assay, using BSA as a 
standard. 

1 5 Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1 ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5(Hg) and total cell extracts (25jLig) 
20 derived from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation at 
4°C in saturation buffer (10% skimmed milk, 0.1% Triton X100 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton X100 in PBS) and incubated 
25 for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1:2000 dilution of horseradish peroxidase labelled anti- 
mouse Ig. The membrane was washed twice with 0.1% Triton X100 in PBS and developed with the 
Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 



5 



10 
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S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected 
and used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a 
nutator and let to grow until OD620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml 
5 Eppendorf tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was 
washed once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 6 2o of 0.5, 
diluted 1:20000 in Gey's buffer and stored at 25°C. 

50|Ltl of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25|il of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 

10 incubated at 4°C. 25\i\ of the previously described bacterial suspension were added to each well. 
25\x\ of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22(il of 
each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22|il of each sample/well were plated on 

15 Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II gives a summary of the cloning, expression and prurification results. 



TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fusion 


orf 6 


+ 


+ 


+ 


GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 




orf 9 


+ 


+ 


+ 


GST-fusion 


orf 10 


+ 


n.d. 


n.d. 




orf 11 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fusion 
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orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 


+ 


+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


.His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


•+ 


+ 


+ 


GST-fusion 


orf 28 


+ 


+ 


+ 


GST-fusion 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 




orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf 89 


+ 


n.d. 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 


+ 


n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 




orf 111 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 
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orfll9 


+ 


+ 


n.d. 


His-fusion 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


. + 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 





Example 1 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 1): 



1 ATGAAACAGA CAGTCAA . AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A . GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC.GTGC GCCAAGATGA TACCGAAGCG 

3 01 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 
351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 
4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGTGCGCCA 
501 AGACCG... 

This corresponds to the amino acid sequence (SEQ ID NO: 2; ORF37): 



1 MKQTVXMLAA ALIALGLNRP - VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 
51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 
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101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 
151 AQNNLGVMYA ERXRVRQD . . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 3): 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 4; ORF37-1): 



1 MKQTVKWLAA ALIALGLNRA VWAD DVSDFR ENLQAAAQGN AAAQYNLGAM 

51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GWQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of N. meningitidis (SEQ ID NO: 5): 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

2 01 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 6; ORF37a): 



1 MKQTVKWLAA ALIALGLNQA VWAD DVSDFR ENLQAAAQGN AAAQNNLGVM 
51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DNDQRLKAGY * 



The originally-identified partial strain B sequence (ORF37) (SEQ ID NO: 2) shows 68.0% identity 
over a 75aa overlap with ORF37a (SEQ ID NO: 6): 



10 20 30 40 50 60 

or f 3 7 . pep MKQTVXMLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 

Mill MINI llllh II llllllllll llllllllll 1 1 hi I =1 MM 

or f 3 7a MKQTVKWLAAALIALGLNQAVWAD DVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVRQD 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 3 7 . pep DAEAVRWYRQ PAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 

MM: ::| 
or f 3 7a RALAQEWLGKACQNGYQDSCDNDQRLKAGYX 

70 80 90 



CHIR-0160 (356.001) 



-78- 



PATENT 



Further work identified the corresponding gene in N. gonorrhoeae (SEQ ID NO: 7 ): 

1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

5 101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TATGAAAATG GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

201 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

10 351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 

This encodes a protein having amino acid sequence (SEQ ID NO: 8; ORF37ng): 

1 MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 
51 YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
15 101 QQWLGKACQN GDQNSCDNDQ RLKAGY* 

The originally-identified partial strain B sequence (ORF37) (SEQ ID NO: 2) shows 64.9% identity 
over a 1 1 1 aa overlap with ORF37ng (SEQ ID NO: 8): 

or f 3 7 . pep MKQTVXMLAAALIALGLNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 6 0 

20 Mill MM II II llh II MINIMI II 1 1 II 1 1 hi I hi I = I hi 

or f 3 7ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 6 0 

or f 3 7 . pep DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 12 0 

: M h 1 1 h MM I II II 1 1 1 II Ml I II I : I M M M 

orf 37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 120 

25 orf 3 7 . pep VI YAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 16 8 

orf37ng RLKAGY 126 

The complete strain B sequence (ORF37-1) (SEQ ID NO: 4) and ORF37ng (SEQ ID NO: 8) show 
30 51.5% identity in 198 aa overlap: 

10 20 30 40 50 60 

orf 3 7-1. pep MKQTVKWLAAALI ALGLNRAVWADDVSDFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 

1 1 1 1 II 1 1 1 1 1 1 1 1 M 1 1 M I M IMMMMIM IIIIIIIMIIMI MMMM . 

orf 3 7ng MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 
35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 3 7 - 1 . pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 

:MIMIIMMII Mill I II MMMM 

or f 3 7ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 

40 70 80 90 

130 140 150 160 170 180 

orf 3 7 - 1 . pep ' VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 

I I I I : 1 : I I I I 

or f 3 7ng LALAQQWLGKAC 

45 ioo 



CHIR-0160 (356.001) 



PATENT 



-79- 



orf 37-1 .pep 



190 199 
QNGDQDGCDNDQRLKAGYX 



orf 3 7ng 




5 



110 120 



Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

10 ORF37-1 (SEQ ID NO: 4) (1 lkDa) was cloned in pET and pGex vectors and expressed in E.coli, 
as described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 1A shows the results of affinity purification of the GST-fusion protein, and Figure 
IB shows the results of expression of the His-fusion in Exoli. Purified GST-fusion protein was 
used to immunise mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 

15 1C), and a bactericidal assay (Figure ID). These experiments confirm that ORF37-1 (SEQ ID NO: 
4) is a surface-exposed protein, and that it is a useful immunogen. 



Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1 (SEQ 
ID NO: 4). 



Example 2 



20 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 9): 



25 



TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA • 
GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 
TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 
ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 
GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 
CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 
TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
GCCGAATAA 



30 This corresponds to the amino acid sequence (SEQ ID NO: 10): 



1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 
51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
101 TSFAEKNADG GNAEKAAE* 



35 



Computer analysis of this amino acid sequence gave the following results: 



CHIR-01 60 (356.001 ) PATENT 

-80- 

Homology with a hypothetical H.influenzae protein (vbrd.haein; accession number p45029 (SEP 
ID NO: 1105)) 

SEQ ID NO: 9 and ybrd.haein (SEQ ID NO: 1 105) show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

5 yrbd.h LG I GALVPLGLRVANVQGFAETKS YTVTATFDNI GGLKVRAPLKIGGWI GRVS AI TLDE 

|::||||lhlhl UI-IIMMI 
N.m FGD I GGLKVNAPVKSAGVLVGRVGAI GLDP 

10 20 30 

80 90 100 110 120 130 

1 0 yrbd . h KS YLPKVS I AINQEYNE I PENSSLS I KTSGLLGEQYI ALTMGFDDGDTAMLKNGSQ IQDT 

Ml ::|::::: :| :: = := I I lllllllllhl I llh I =h I I 

N.m KS YQARVRLDLDGKY - QFSSDVSAQ I LTSGLLGEQ Y I GLQQG GDTENLAAGDT I S VT 

40 50 60 70 80 

140 150 160 

15 yrbd.h TSAMVLEDLIGQFL- - YGSKKSDGNEKSESTEQ 

:||||||:|| :|: :::|::||:: ::::|: 
N . m S S AMVLENL I GKFMTS FAEKNADGGNAEKAAEX 
90 100 110 120 

Homology with a predicted ORF from N. gonorrhoeae 

20 SEQ ID NO: 9 shows 99.2% identity over a 118aa overlap with a predicted ORF from N. 
gonorrhoeae (SEQ ID NO: 1106yrbx): 



20 30 40 50 60 70 

yrbd GAAAVAFLAFRVAGGAAFGGS DKT YAVYADFGD I GGLKVNAP VKS AGVLVGRVGA I GLD P 

lllllilllllllll I MIIIIIMI 
25 N.m FGD I GGLKVNAPVKSAGVLVGRVGAI GLDP 

10 20 30 



80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

■ I 1 1 1 1 1 II 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U M 1 1 1 1 1 1 1 i 

30 N.m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

40 50 60 70 80 90 



140 150 160 

VLENL I GKFMTS F AE KNAEGGNAE KAAEX 

I II I I M I I I I I I I I I h I I ;l I I I I I 
VLENL I GKFMTS FAEKNADGGNAEKAAEX 
100 110 120 

The complete yrbd H.influenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
40 membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
epitopes, could be a useful antigen for vaccines or diagnostics. 



yrbd 
35 N.m 
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Example 3 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 1 1): 



1 . . ATTTTGATAT ACCTCATCCG CAAGAATCTA GGTTCGCCCG TCTTCTTCTT 

51 TCAGGAACGC CCCGGAAAGG ACGGAAAACC TTTTAAAATG GTCAAATTCC 

101 GTTCCATGCG CGACGGCTTG TATTCAGACG GCATTCCGCT GCCCGACGGA 

151 GAACGCCTGA CACCGTTCGG CAAAAAACTG CGTGCCGcCA GTwTGGACGA 

2 01 ACTGCCTGAA TTATGGAATA TCTTAAAAGG CGAGATGAGC CTGGTCGGCC 

2 51 CCCGCCCGCT GCTGATGCAA TATCTGCCGC TGTACGACAA CTTCCAAAAC 

3 01 CGCCGCCACG AAATGAAACC CGGCATTACC GGCTGGGCGC AGGTCAACGG 
351 GCGCAACGCg CTTTCGTGGG ACGAAAAATT CGCGTGCGAT GTTTGGTATA 

4 01 TCGACCACTT CAGCCTGTGC CTCGACATCA AAATCCTACT GCTGACGGTT 
4 51 AAAAAAGTAT TAATCAAGGA AGGGATTTCC GCACAGGGCG AACA . aCCAT 
501 GCCCCCTTTC ACAGGAAAAC GCAAACTCGC CGTCGTCGGT GCGGGCGGAC 
551 ACGGAAAAGT CGTTGCCGAC CTTGCCGCCG CACTCGGCCG GTACAGGGAA 
601 ATCGTTTTTC TGGACGACCG CGCACAAGGC AGCGTCAACG GCTTTTCCGT 
651 CATCGGCACG ACGCTGCTGC TTGAAAACAG TTTATCGCCC GAACAATACG 
701 ACGTCGCCGT CGCCGTCGGC AACAACCGCA TCCGCCGCCA AATCGCCGAA 
751 AAAGCCGCCG CGCTCGGCTT CGCCCTGCCC GTACTGGTTC ATCCGGACGC 
801 GACCGTCTCG CCTTCTGCAA CAGTCGGACA AGGCAGCGTC GTTATGGCGA 
851 AAGCGGTCG. . 



This corresponds to the amino acid sequence (SEQ ID NO: 12; ORF3): 



1 . . ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 

51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVGPRPLLMQ YLPLYDNFQN 

101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 

151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 

2 01 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 

2 51 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV. . 

Further sequence analysis revealed the complete nucleotide sequence (SEQ ID NO: 13): 



1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

2 01 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

2 51 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT ATGGAATATC 
301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

3 51 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

4 01 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 
4 51 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 
501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG 
551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 
601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 
651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 
701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 
751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG . CCGTCGGCAA 
801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 
851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 
901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 
951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 
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1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 
1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 
1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

5 This corresponds to the amino acid sequence (SEQ ID NO: 14; ORF3-1): 

1 MSKFFKRLFD IVASA SGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 
51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 
101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 
151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 
10 2 01 KLAWGAGGH GKWADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

2 51 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 
301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 
351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 
4 01 KPLPRKNPET STA* 

15 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 (SEQ ID NO: 12) shows 93.0% identity oyer a 286aa overlap with an ORF (ORF3a) (SEQ 
ED NO: 16) from strain A of N. meningitidis: 

20 10 20 30 

orf3.pep I L I YL I RKNLGS PVFFFQERPGKDGKP FKMVKFR 

M I I I I I I I I M I I I I I I I I I I I I I I I I I I 
orf 3a MSKFFKRLFD IVASA SGLIFLSPVFLI LI YLI RKNLGS PVFFFQERPGKDGKP FKMVKFR 

10 20 30 40 50 60 

25 40 50 60 70 80 90 

orf 3 . pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 

I I : 1 : I I I I I I I I I I I I I M I Ml i I M M I I I - I I hi I I I M I ' I I I I I I I 
orf 3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

30 100 110 120 130 140 150 

orf 3 . pep YDNFONRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 
i I I I I I I I I I I I I I I I I I I M I I I I I I I I h I I I h M I I I I I I I I I I I I Ml I I i I I 
o r f 3 a YDNFQNRRHEMKPG I TGWAQVNGRNALS WDERFACD I WY I DHFS LCLD I KI LLLTVKKVL 

130 140 150 160 170 180 

35 160 170 180 190 200 210 

orf 3 . pep IKEGISAQGEXTMPPFTGKRKLAVVGAGGHGKVVADLAAALGRYREIVFLDDRAQGSVNG 

Tlllllllll I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I III llhlillll 
or f 3 a I KEG I S AQGE ATM P P FTGKRKLA WGAGGHGKWAELAAALGT YGE I VFLDDRVQGS VNG 

190 200 210 220 230 240 

40 220 230 240 250 260 270 

orf 3 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I I I I I I I I I I I I I I I h hi n I I I M I I I I I I I I I I I I I I I I I h ' I h I I I I I I I 
orf 3a FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

250 260 270 280 290 300 



45 



280 
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orf3.pep VGQGSWMAKAV 
1111=1111111 

orf3a VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 

310 320 330 340 350 360 

The complete length ORF3a nucleotide sequence (SEQ ID NO: 1 5) is: 



1 


ATGAGTAAAT 


TCTTCAAACG 


51 


ACTGATTTTC 


CTCTCGCCAG 


101 


AGAATCTGGG 


TTCGCCCGTC 


151 


GGAAAACCTT 


TT AAAATGG T 


201 




ATTCTGCTGC 

x x v x vjv^ x \Jv 


2 51 


AAAAAPTGCG 


TGCCGCCAGT 


3 01 




ACATGAGCCT 


*3 C 1 
J j! 






401 


GCATTACCGG 


CTGGGCGCAG 


451 


GAACGCTTCG 


CATGCGACAT 


501 


CGACATCAAA 


ATCCTACTGC 


551 


GGATTTCCGC 


ACAGGGCGAA 


601 


AAACTTGCCG 


TCGTCGGTGC 


651 


TGCCGCCGCA 


CTCGGCACAT 


701 


TCCAAGGCAG 


CGTCAACGGC 


751 


GAAAACAGTT 


TATCGCCCGA 


801 


CAACCGCATC 


CGCCGCCAAA 


851 


CCCTGCCCGT 


CCTGATTCAT 


901 


GTCGGACAAG 


GCGGCGTCGT 


951 


CGTATTGAAA 


GACGGCGTAA 


1001 


ATTGCCTGCT 


TGATGCTTTC 


1051 


GGCAACACGC 


GTATCGGCGA 


1101 


CCGCCAGCAG 


ATCCGTATCG 


1151 


TCGTCGTGCG 


CGACGTTTCA 


1201 


AAACCATTGG 


CAGGCAAAAA 



This is predicted to encode a protein 



CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 
TATTTTTGAT TTTGATATAC CTCATCCGCA 
TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 
CAAATTCCGT TCCATGCACG ACGCGCTTGA 
CCGACGGAGA ACGCCTGACA CCGTTCGGCA 
TTGGACGAAC TGCCCGAACT GTGGAACGTC 
GGTCGGCCCC CGCCCGCTGC TGATGCAATA 
TCCAAAACCG CCGCCACGAA ATGAAACCGG 
GTCAACGGGC GCAACGCGCT TTCGTGGGAC 
CTGGTATATC GACCACTTCA GCCTGTGCCT 
TGACGGTTAA AAAAGTATTA ATCAAAGAAG 
GCCACCATGC CCCCTTTCAC AGGAAAACGC . 
GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 
ACGGCGAAAT CGTTTTTCTG GACGACCGCG 
TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 
ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 
TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 
CCGGACTCGA CCGTCTCGCC TTCTGCAACA 
TATGGCGAAA GCCGTCGTAC AGGCTGACAG 
TTGTGAACAC TGCCGCCACC GTCGATCACG 
GTCCACATCA GCCCGGGCGC GCACCTGTCG 
AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 
GCAGCCGCGC AACCATTGGA GCGGGCGCAG 
GACGGCATGA CCGTCGCGGG CAACCCGGCA 
TACCGAGACC CTGCGGTCGT AA 

ng amino acid sequence (SEQ ID NO: 16): 



. 1 MSKFFKRLFD IVASA SGLIF LSPVFLILIY LI RKNLGS PV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

2 51 ENSLSPEQFD IAVAVGNNRI RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 

3 01 VGQGGWMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 
351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 

4 01 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



ORF3-1 (SEQ ID NO: 14) shows 94.6% identity in 410 aa overlap with ORF3a (SEQ ID NO: 16): 



10 20 30 40 50 60 

orf3a.pep MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

1 1 i 1 1 1 1 1 1 1 M : I II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf3-l MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 3a . pep SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
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IhlllllMI III IIIIMIIIIII IIIIIIIIMI MIMIIIIMIIMMIM 

SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 3a . pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 

i 1 1 1 1 MM 1 1 1 1 1 1 1 1 M 1 1 1 1 M : II hi I M :| M 1 1 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 

orf 3-1 YDNFQNRRHEMKPG I TGWAQVNGRNALSWDEKFACDVWY I DHFSLCLD I KI LLLTVKKVL 

130 140 150 160 170 180 



10 



190 200 210 220 230 240 

orf 3a . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKVVAELAAALGTYGEIVFLDDRVQGSVNG 

I M M 1 1 1 M 1 1 1 1 M 1 1 1 M M 1 1 M I M 1 1 1 M M M M I IIIIIMhIIIIII 

orf 3 - 1 IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

190 200 210 220 230 240 



15 



250 260 270 280 290 300 

orf 3a. pep FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 

I 1 1 II h I II 1 1 1 h h I M h h 1 1 1 1 M 1 1 M 1 1 1 1 M 1 1 1 hi h h 1 1 1 1 1 

orf 3 - 1 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

250 260 270 280 290 300 



20 



310 320 330 340 350 360 

orf 3a . pep VGQGGWMAKAWQADS VLKDGV I VNTAATVDHDCLLDAFVH IS PGAHLS GNTR I GEE SW 

MlhlllllMIII III II 1 1 III 1 1 1 III I IIMMMMMII IMM II I II 

orf 3 - 1 VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 



25 



370 380 390 400 410 

orf 3a . pep IGTGACSRQQIRIGS RAT I GAGAWVRDVSDGMTVAGN PAKPLAGKNTETLRSX 

MMMIIMMMIMI MM III IMMMIMII Mill II II 

orf 3 - 1 IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

370 380 390 400 410 



Homology with hypothetical protein encoded by yvfc gene (accession Z71928) (SEP ID NO: 
1108^) of B. subtilis 



30 



ORF3 (SEQ ID NO: 12) and YVFC proteins (SEQ ID NO: 1108) show 55% aa identity in 170 aa 
overlap (BLASTp): 



35 



40 



ORF3 


3 


yvfc 


27 


ORF3 


63 


yvfc 


87 


ORF3 


123 


yvfc 


147 



I YL I RKNLGS P VF FFQERPGKDGKP F KM VKFRSMRDGL YSDG I PLPDGERLTP FGKKLRA 6 2 

I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 

I AWRLKI GS PVFFKQVRPGLHGKP FTLYKFRTMTDERDS KGNLLPDEVRLTKTGRL I RK 86 



S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE + KPG I TGWAQ +NGRNA+ S 



W++KF DVWY+D++S LD 



EGI 



FTG 



Homology with a predicted ORF from N. gonorrhoeae 
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ORF3 (SEQ ID NO: 12) shows 86.3% identity over a 286aa overlap with a predicted ORF 
(ORF3.ng) (SEQ ID NO: 1 8) from N. gonorrhoeae: 

orf3 ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 

: 1 1 i I M 1 1 1 1 1 1 1- I i 1 1 M 1 1 i 1 1 1 Ml I 

5 orf 3ng MSKAVKRLFDIIASA SGLIVLSPVFLVLIYLI RKNKGSPVFFIRERPGKDGKPFKMVKFR 60 

orf3 SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 94 

Illhl lllllllhllll llllllhl 1 1 1 1 M hi ! 1 , 1 1 1 1 i I M 1 1 1 1 IN 

orf3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

orf 3 YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 

10 ::|M!IIIIIIIII llllllll llllllhlllll hlh IhllhlllllM 

or f 3 ng YNKFQNRRHEMKPG I TGWAQVNGRNALSWDEKFS CDVWYTDNFS FWLDMKI LFLTVKKVL 180 

orf 3 I KEGI SAQGEXTMPPFTGKRKLAWGAGGHGKVVADLAAALGRYRE I VFLDDRAQGSVNG 2 14 

Ml Ml 1 1 II I II Ihhllll hi II MM II hi 1 1 II I I 1 1 1 1 M I h 1 1 1 II I 

orf 3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 24 0 

15 orf 3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT " 2 74 

I I I I I : I I I I I I I U M - I I I, M I I I I I M : I I I I M I I I I = I I I I I I t I I I 
orf 3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 300 

orf 3 VGQGSWMAKAV 286 

MIMMMIM 

20 orf3ng IGQGSVVMAKAWQAGSVLKDGVIWTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 

The complete length ORF3ng nucleotide sequence (SEQ ID NO: 17) is: 



1 ATGAGTAAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

25 101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATTCCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTTT TGATGCAGTA 

30 351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

4 51 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGATATGAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

35 601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

701 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 

40 851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 

901 ATCGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

45 1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGACGGCAT AA 



This encodes a protein having amino acid sequence (SEQ ID NO: 18): 
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1 MSKAVKRLFD IIAS ASGLIV LSPVFLVLIY LI RKNLGSPV FFIRERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDSERLT DFGKKLRATS LDELPELWNV 

101 LKGEMSLVGP RPLLMQYLPL YNKFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFSCDVWYT DNFS FWLDMK I LFLTVKKVL IKEGISAQGE ATMPPFAGNR 

5 201 KLAVIGAGGH GKWAELAAA LGTYGEIVFL DDRTQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD ITVAVGNNRI RRQITENAAA LGFKLPVLIH PDATVSPSAI 

301 IGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESR IGTGACSRQQ TT VGSGVTAG AGAVIVCDI P DGMTVAGNPA 

401 KPLTGKNPKT GTA* 

10 

This protein shows 86.9% identity in 413 aa overlap with ORF3-1 (SEQ ID NO: 14): 



15 



10 20 30 40 50 60 

orf 3 - 1 . pep MSKFFKRLFDIVASASGLIFLSPVFLILI YLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 

. Ill 1 1 1 1 1 : 1 1 1 1 ! I 1 1 1 : 1 i H 1 II 1 1 1 1 II 1 1 1 l-l M 1 1 1 1 1 1 1 1 1 : 1 1 

orf3ng MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 

10 20 30 40 50 60 



20 



70 80 90 100 110 120 

orf 3-1 .pep SMRDALDSDG I PLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

llllllll llllh II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 3ng SMRDALDSDG I PLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 



25 



130 140 150 160 170 180 

orf 3 - 1 . pep YDNFQNRRHEMKPGI TGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDI KI LLLTVKKVL 

|::|| I I II I II II II I II II II I II II III I hi III I hlh I hi Ihl II I III 
orf3ng YNKFQNRRHEMKPG I TGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKI LFLTVKKVL 

130 140 150 160 170 180 



30 



190 200 210 220 230 240 

orf 3 - 1 . pep IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 

1 1 1 1 1 1 1 1 1 1 1 1 1 M hM 1 1 1 M 1 1 1 1 1 1 II hi 1 1 1 1 1 I IIIUIMI III 

orf 3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKVVAELAAALGTYGEIVFLDDRTQGSVNG 

190 200 210 220 230 240 



35 



250 260 270 280 290 300 

orf 3 - 1 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

I lllllll lllllllhhhllllllllllh hill hlh llllllll 
orf 3ng FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 

250 260 270 280 290 300 



40 



310 320 330 340 350 360 

orf 3 - 1 . pep VGQGSVVMAKAVVQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

h.llllllllllllllllll hllllhlllll hlhlhhhl lllhlll 
orf 3ng IGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 

310 320 330 340 350 360 



45 



370 380 390 400 410 

orf 3-1 .pep IGTGACSRQQ I RIGSRAT I GAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

llllllllll :|| h I Ihl I I I 1 I 1 i I I t I I I E llhhlll 
orf 3ng IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 

370 380 390 400 410 



In addition, ORF3ng (SEQ ID NO: 18) shows significant homology with a hypothetical protein 



(SEQ ID NO: 1 1 10) from B.subiilis: 
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gnl|PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 
)gi | 1945702 |gnl | PID | e313004 (Z94043) hypothetical protein [Bacillus subtilis] 
)gi | 2635938 jgnl | PID| ell86113 (299121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities = 114/195 (58%), Positives = 142/195 (72%) ' 



Query : 


5 


VKRLFD 1 1 AS ASGL I VL S PVFLVL I YL I RKNLGS PVFFI RERPGKDGKPFKMVKFRSMRD 


64 






+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 




Sbjct : 


3 


LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 


62 


Query: 


65 


ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 


124 






DS G LPD RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 




Sbjct: 


63 


ERDSKGNLLPDEVRLTKTGRLIRKLSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEK 


122 


Query: 


125 


QNRRHEMKPG I TGWAQVNGRNALSWDEKFSCDVWYTDNFS FWLDMKI LFLTVKKVL I KEG 


184 






Q RRHE + KPG I TGWAQ +NGRNA+ S W+ + KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 




Sbjct: 


123 


QARRHE VKPG I TGWAQ I NGRNA I S WE KKFELD WYVDNWS FFLDLK I LCLT VRKVLVSEG 


182 


Query: 


185 


ISAQGEATMPPFAGN 199 








I T F G+ 




Sbjct : 


183 


IQQTNHVTAERFTGS 197 





The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N. gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 4 

The following partial DNA sequence was identified in N. meningitidis (SEQ K) NO: 19): 

1 . . AACCATATGG CGATTGTCAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAATCGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT . GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 



This corresponds to amino acid sequence (SEQ ID NO: 20; ORF5): 

1 . . NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY SIEEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS .... XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be (SEQ ID NO: 21): 



1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 
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51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 

301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

4 01 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

4 51 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

801 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

851 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence (SEQ ED NO: 22; ORF5-1): 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDS I E RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

201 ERWRIHAATE IEDINTFFGT EYSSEEADTI R'PGHSRVGTS ARARRKSPYR 

251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

Further work identified the corresponding gene in strain A of N. meningitidis (SEQ ID NO: 23 ): 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

2 01 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

2 51 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 
301 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

3 51 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 

4 01 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 
4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 
501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 
551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 
601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 
651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 
701 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 
751 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 
801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 
851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 
901 ■ TAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 24; ORF5a): 



1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 

51 KVLDFSDLEV RDAMITRSRM NVLKENDS IE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 

201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 

251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
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301 * 

The originally-identified partial strain B sequence (ORF5) (SEQ ID NO: 20) shows 54.7% identity 
over a 124aa overlap with ORF5a (SEQ ID NO: 24): 

5 10 20 30 

orf 5 .pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

1 I I ■ ] : I I I I I I 1 I I I I I I I I 1 I I J = I 

orf 5a FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 

10 40 50 60 70 80 90 

orf 5 .pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 

I I I II IM I I I I I I I I - Ml I I I MIMIIMl IIMM III :|| I 
orf 5a EDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGTPA 
190 200 210 220 230 240 

15 100 110 120 130 

orf 5 . pep RARRKS PYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

INI III I I hi M I IIMIIMM 

orf 5a RARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250 260 270 280 290 300 

20 

The complete strain B sequence (ORF5-1) (SEQ ID NO: 22) and ORF5a (SEQ ID NO: 24) show 
92.7% identity in 300 aa overlap: 

10 20 30 40 50 60 

orf 5a. pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

25 I I I I I I I I I I I I I I I I I I I I I I I I I II = I I I I I I I I I I I I I M I I I I I I I I I I I I I II 

orf 5 - 1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5a. pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

30 MIMMIMIMM I MMMI MMMMMMMMMMMMIMMIMI 

orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5a. pep EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

35 Ml I MINIMI II I II II MM l-MIIIM MINI MM I III III MINN III I 

orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

130 140 150 160 170 180' 

190 200 210 220 230 240 

orf 5a. pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 

40 : I I I I I I I I : I I I I I I I I h I I I I I I I I I I I I I I h I I I I II I I I I I I I I I I I = I I 

orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 

190 200 210 220 230 

250 260 270 280 290 300 

orf 5a . pep PARARRKSXYRRXAXHXRXRXQP P PAYADGDPREVSS AVS VQFRMTVRAFS VS I RP I RXT 

45 || I III I III I I |:| II 1 1 1 1 1 1 II 1 1 1 M M M I II 1 1 1 1 II 1 1 II 1 1 II I 

orf 5-1 S ARARRKS PYRRFAVHRRTRRQP P PAYADGDPREVSTAVS AQFRMTVRAFS VS I RP I RQT 

240 250 260 270 280 290 
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Further work identified the a partial DNA sequence in N. gonorrhoeae (SEQ ID NO: 25) which 
encodes a protein having amino acid sequence (SEQ ID NO: 26; ORF5ng): 



1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 

251 RRFAVHRRPR RQPPPAHADG DPREVSRACP HRRFCTV* 



Further analysis revealed the complete gonococcal nucleotide sequence (SEQ ID NO: 27) to be: 



1 ATGGACGGCG CACAACCGAA AACAAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA ACAGGAAGTT TTTGATGCCG ACACACTGAC CCGGCTGGAA 

151 AAAGTATTGG ACTTTGCCGA GCTGGAAGTG CGCGATGCGA TGATTACGCG 

2 01 CAGCCGCATG AACGTATTGA AAGAAAACGA CAGCATCGAA CGCATCACCG 

2 51 CCTACGTCAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 
301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC TCAAATATAT 

3 51 GTTCAACCCC GAGCAGTTCC ACCTGAAATC CGTCTTGCGC CCTGCCGTTT 

4 01 TCGTGCCCGA AGGCAAATCT TTGACCGCCC TTTTAAAAGA GTTCCGCGAA 
4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 
501 TTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGT GACATCGAAG 
551 ACGAGTTTGA CGAAGACGAA AGCGccgacg acatCCACTC cgTTTccgCC 
6 01 GAACGCTGGC GCATCCacgc ggctaCCGAA ATCGAAGaca TCAACGCCTT 
651 TTTCGGTACG GAatacggca gcgaagaagc cgacaccatc cggcggctTG 
701 GTCATTCAGG AATTGGGACA CCTGCCCGTG CGCGGCGAAA AAGTCCTTAt 
751 cggcgGTTTG Cagttcaccg tCGCCCGCGC CGACAACCGC CGCCTGCACA 
801 CGCTGATGGC GACCCGCGTG AAGTAAGCAG AGCCTGCCcg AccgccgttT 
8 51 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 
901 ATCCGCCAAA CATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 28; ORF5ng-l): 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
2 51 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originally-identified partial strain B sequence (ORF5) (SEQ ID NO: 20) shows 83.1% identity 
over a 135aa overlap with the partial gonococcal sequence (ORFSng) (SEQ ID NO: 26): 



or f 5 NHMAI VIDEYGGTSGLVTFEDI IEQIVGEI 3 0 

I I I I I I I I I I I I I I I I II I I I I I I I I I I : I 

orf 5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAI VIDEYGGTSGLVTFEDI I EQ I VGD I 182 

or f 5 EDEFDEDDSADNI HAVS SDTWRI HAATE I ED INTFFGTE YS I EEADT IXRPGHSRVGTSA 9 0 

Illlllhllhlhll:: MIIIIIIM iMIIIIh I I I i I I I Ml HI I 

orf 5ng EDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 



orf5 



RARRKS P YRRFAVHRRTRRQPPPAYADGDPREVSX RRFCTV 131 
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III 1 II I M I MM II 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 1 IMM 

orf5ng RARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) (SEQ ID NO: 22 & SEQ 
5 ID NO: 28) show 92.4% identity in 304 aa overlap: 

10 20 30 40 50 60 

orf 5ng- 1 . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 

IMIMI MIMM IMMIMMI IMMMMMIIMIMM IIIMIIhMII 

orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 
10 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5ng- 1 . pep RDANITRSRMNVLK^NDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM 

Mill MIIIIIIIIIIMI IMIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMI I 

orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
15 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5ng- 1 . pep EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 

1 1 1 1 M hi 1 1 1 1 1 1 1 1 1 1 M ' 1 1 1 M i 1 1 ' II 1 1 1 1 1 1 1 i II 1 1 1 1 M 1 1 1 1 1 1 1 1 M I 

orf 5-1 EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
20 130 140 150 160 170 180 



190 200 210 220 230 240 

orf 5ng-l.pep DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 

: I I I I M I I : I ! I : I I : I I : I I I I I I I I I I I I I I I : I I I I I I : I I II I I I I III =11 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
25 190 200 210 220 230 



250 260 270 280 290 300 

orf 5ng-l . pep PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMTVRSFSVSIRP 

IIIIIIIIIIIIIMII llllllhlllllllll M I I I I I M I h I M I I I 

orf 5- 1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVSIRP 

30 240 250 260 270 280 290 



orf 5ng-l .pep IRQTX 

Mill 

orf 5-1 IRQTX 
35 300 

Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlvC (accession U32716) (SEP ID NO: 1111) of H.influenzae 



40 ORF5 (SEQ ID NO: 20) and TlyC proteins (SEQ ID NO: 1111) show 58% aa identity in 77 aa 
overlap (BLASTp). 
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ORF5 2 HMAIVIDEYGGTSGLVTFEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 

HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
TlyC 166 HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD- IRQLSRHTYAVRALTDIDD 224 

ORF5 62 INTFFGTEYS I EEADTI 78 

N F T++ EE DTI 
TlyC 225 FNAQFNTDFDDEEVDTI 241 



ORF5ng-l (SEQ ID NO: 28) also shows significant homology with TlyC (SEQ ID NO: 1111): 

SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

orf 5ng- 1 . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 



tlyc_haein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 

10 20 30 40 50 60 



60 70 80 90 100 109 

orf 5ng-l .pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE- -DKDEVLGILH 

|:::|||:||| II II- :::::::= = I : = I I I I I I I I : : hh-llll 

tlyc_haein VME I AELRVRD I M I PRSQ I I F I EDQQDLNTCLNT I I ES AHSRFPV I ADADDRDNI VG I LH 

70 80 90 100 110 120 



110 120 130 140 150 160 

orf 5ng- 1 . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 

Mill- : I I hhllhhllhl : :|hll I I I I I h I I = I = : I I I 
tlyc_haein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 

130 140 150 160 170 180 

170 180 190 200 210 220 

orf 5ng-l .pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 

I I :| I I : I I I I I I I I I I I I'- I || |:::| : = =:| hhhll hi- Hhl 
tlyc_haein VTIEDILEQIVGDIEDEFDEEEIAD- IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 

190 200 210 220 230 

230 240 250 260 270 280 

orf 5ng- 1 . pep T I RRLGHS G I G - T P ARARRKS P YRRF AVHRRPRRQ P P P AHADGDPRE VS RAC PTAVS AQF 

II I : =1 I h 

tlyc_haein TIGGLIMQTFGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 



Homology with a hypothetical secreted protein from E.coli: 



ORF5a (SEQ ID NO: 24) shows homology to a hypothetical secreted protein (SEQ ID NO: 1112) 
from E.coli: 



sp|P77392 | YBEX_ECOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
)gi | 1778577 (U82598) similar to H. influenzae [Escherichia coli] )gi| 1786879 
(AE000170) f292 ; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
approx. 440 aa protein YTFL_HAEIN SW: P44717 [Escherichia coli] Length = 292 
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Score = 212 bits (533), Expect = 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 
D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 
DT I SNKKGFFS LLLSQLFHGE PKNRDELLAL I RDSGQNDL I DEDTRDMLEGVMD I ADQRV 6 9 

RDAM I TRSRMNVLKENDS I ER I TAYV I DTAHSRFPV I GEDKDEVLGI LHAKDLLKYM - FN 119 
RD MI RS+M LK N +++ +I++AHSRFPVI EDKD + GIL AKDLL +M + 



Query: 


2 


Sbjct : 


10 


Query: 


61 


Sbjct : 


70 


Query : 


120 


Sbjct: 


130 


Query: 


180 


Sbjct: 


190 



E F + +LR AV VPE K + +LKEFR QR HMAIVIDE+GG SGLVT EDI+E IV 
AEAFSMDKVLRQAVWPESKRVDRMLKEFRSQRYHMAIVIDEFGGVSGLVTIEDILELIV 189 

GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 22 9 
G+IEDE+DE++ D +S W + A I ED N FGT +S EE DT 
GE I EDE YDEEDD ID- FRQLSRHTWTVRALAS IEDFNEAFGTHFSDEEVDT 238 

Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae are secreted and could thus be useful antigens for vaccines or 
diagnostics. 

ORF5-1 (SEQ ID NO: 22) (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as 
described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 2A shows the results of affinity purification of the GST-fusion protein. Purified 
GST-fusion protein was used to immunise mice, whose sera were used for Western blot analysis 
(Figure IB). These experiments confirm that ORF5-1 (SEQ ID NO: 22) is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 5 

The following partial DNA sequence was identified in ^meningitidis (SEQ ID NO: 29): 

1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 

51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 

101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAgGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

401 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

4 51 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC . . 
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This corresponds to the amino acid sequence (SEQ ID NO: 30; ORF7): 

1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP.. 

Further sequence analysis revealed the complete DNA sequence (SEQ ID NO: 31): 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 ACCGAATCAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 

4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 32; ORF7-1): 

1 MLRKLLKWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by yceg gene (accession P44270) (SEP ID NO: 
1113) of H influenzae 

ORF7 (SEQ ID NO: 30) and yceg proteins (SEQ ID NO: 1113) show 44% aa identity in 192 aa 
overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ I EG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 
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ORF7 56 • NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIiyiAXLV 115 

N EG + PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

ORF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYT 175 
5 EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 

ORF7 176 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

10 

The complete length YCEG protein has sequence: 

1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEEIFALLD 

15 151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from N. meningitidis (strain A) 

20 ORF7 (SEQ ID NO: 30) shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) (SEQ 
ED NO: 34) from strain A of N. meningitidis: 

10 20 30 

orf 7 .pep MRGGRPDSVTVQI IEGSRFSHMRKVIDATP 

I I llll II II II II I llllllllll III II 
25 orf 7a AAYVLGVHNRLHTGTYRL PS EVSAWD I LQKMRGGRPDSVTVQ I IEGSRFSHMRKVIDATP 

70 80 90 100 110 120 

40 50 60 70 80 90 

orf 7 . pep D I GHDTKGWSNEKLMAE VAPDAFSGNPEGQFFPDS YE IDAGGSDLQ I YQTAYKAMQRRLN 

II II I I II II M II MUM M I M 1 1 II 1 1 1 1 Ml I MM I M M II llllllllll 

30 orf 7a DIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAMQRRLN 

130 140 150 160 170 180 

100 110 120 130 140 150 

orf 7 . pep EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 

I I I MM 1 1 1 II 1 1 1 1 M I; hllllllll 1 1 1 1 1 1 M , M 1 1 1 1 II llll 

35 orf 7a EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 

190 200 210 220 230 240 

160 170 180 

orf 7 . pep GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 

I I II I I I I I I I I I II I I I I I I I II I I I I II I I I I I 
40 orf 7a GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 

250 260 270 280 290 300 

orf 7a DGTGLSQFSHDLTEHNAAVRKY I LKKX 

310 320 330 



45 The complete length ORF7a nucleotide sequence (SEQ ID NO: 33) is: 
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1 . ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

5 201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

3 01 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 
351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 
10 4 51 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC . GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

15 701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 

20 951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 



This is predicted to encode a protein having amino acid sequence (SEQ ID NO: 34): 



1 MLRKLLKWSA VFLTVSAAVF AA LLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

25 101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

3 01 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

30 

A leader peptide is underlined. 



ORF7a (SEQ ID NO: 34) and ORF7-1 (SEQ ID NO: 32) show 98.8% identity in 331 aa overlap: 



10 20 30 40 50 60 

orf 7a . pep MLRKLLKWS AVFLTVSAAVFAALLFVPKDNGRAYR I KI AKNQG I S S VGRKLAEDR I VFSR 

llllllllll IIIMIMIMI MIIIIIIIMMIIIIIIIIMM MIMIMM 

orf 7 - 1 MLRKLLKWS AVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRI VFSR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7a . pep HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKV 

I I 1 1 1 I I I I ! 1 1 ! 1 1 1 I 1 1 1 1 1 1 1 1 1 1 M II I I I I I 1 1 1 1 1 I 1 1 1 1 I M 1 1 1 1 i 1 1 

orf 7 - 1 HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQI IEGSRFSHMRKV 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 7a . pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 

lllllll 1 1 1 1 1 M 1 1 M 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M I Mill 

or f 7 - 1 IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDS YEIDAGGSDLQI YQTAYKAM 

130 140 150 160 170. 180 



190 200 210 220 230 240 

orf 7a . pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTD 

I I I 1 1 I 1 1 1 U 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 M I I I I I M I I 1 1 I 1 1 1 U I 1 1 1 i 1 1 I M 

orf 7 - 1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 
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190 200 210 220 230 240 

250 260 270 280 290 300 

orf 7a . pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

III IIMMMIII IMIIIMIIIIIIIIII MMIIIMIMMIIII IMIMI III 

5 orf 7 - 1 PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 

310 320 330 

orf 7a .pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
I I I I I I I I I I M I ! I M I I I I I I I I I M I I 
10 orf 7- 1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 

310 320 330 

Homology with a predicted ORF from N. gonorrhoeae 

ORF7 (SEQ ID NO: 30) shows 94.7% identity over a 187aa overlap with a predicted ORF 
(ORF7.ng) (SEQ ID NO: 36) from N. gonorrhoeae: 

15 orf 7 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

Mill llllllll MM lllllllllllll IIIIMM.IMMMI IMIMMI 

orf 7ng MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

orf 7 FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 12 0 

IIIIMIIIIIIII lllllllllllllllllll Mill III MM MUM Mill 

20 orf 7ng FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 12 0 

orf 7 HEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 180 

III lllllllllllllllllll. 1 1 1 M II 1 1 1 II M I II M 1 1 1 M II I 1 1 1 1 

orf 7ng HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 

orf 7 PTPIALP 187 

25 || MM 

orf 7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 236 

An ORF7ng nucleotide sequence (SEQ ID NO: 35) is predicted to encode a protein having amino 
acid sequence (SEQ ID NO: 36): 

30 1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

201 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

35 

Further sequence analysis revealed a partial DNA sequence of ORF7ng (SEQ ID NO: 37): 

1 . . taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 

51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 

101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 

40 151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 

201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 

251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 

301 TGGAGCAATG AAAAACTGAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 
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351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 

4 01 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 

4 51 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT ATAAAAACCC 

501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 

5 551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG CCTGAAAATC 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA TGGGTGCGGC 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC ACGCCGTACA 

701 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc gctgcccggC 

751 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa aatacctgTa 

10 801 tttcgtgtcC AAAATGGACG GCACGGGCTT GAGCCAGTTC AGCCATGATT 

851 • TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA AAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 38; ORF7ng-l): 

1 . . YRIKIAKNQG ISSVGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 
15 51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 
151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLRI 
201 GMRLQTDPSV IYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 
251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

20 

ORF7ng-l (SEQ ID NO: 38) and ORF7-1 (SEQ ID NO: 32) show 98.0% identity in 298 aa 
overlap: 

10 20 30 40 50 60 

orf 7-1 .pep KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

25 I II I 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 

' orf 7ng-l YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 

10 20 30 

70 80 90 100 110 120 

or f 7 - 1 . pep TAAA YVLGVHNRLHTGTYRLPSEVS AWD I LQKMRGGRPDS VTVQ 1 1 EGSRFSHMRKV I DA 
30 | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | || | || | | II I I I I I I II I 

orf 7ng-l TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 7-1 .pep TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

35 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 I I I I I 1 1 1 1 1 M 1 1 1 II I 1 1 1 1 1 I I I 1 1 1 1 1 

orf 7ng- 1 TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 

100 110 120 130 140 150 

190 200 210 220 230 .240 

or f 7 - 1 . pep LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
40 | | | || : | | | | | | | | | | | | || | | | | | : | | | | | | | | | | | | | | || | | | | | | I I | I I I I I I II 

o r f 7 ng - 1 LNEAWAGRQDGLP YKNP YEML I MASL I EKETGHE ADRDHVAS VFVNRLKI GMRLQTDPSV 

160 170 180 190 200 210 

250 260 270 280 290 300 

orf 7-1 .pep I YGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTP I ALPGKAALDAAAHP SGEKYLYFVS 
45 | | | | | | | | | | | M M | | | | | | | | | | | | | | | | | | | || | | | | | : | | | || I | | I I I I I I I I 

orf 7ng-l I YGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTR I ALPGKAAMDAAAHP SGEKYLYFVS 

220 230 240 250 260 270 

310 320 330 

orf 7 - 1 . pep KMDGTGLSQFSHDLTEHNAAVRKYILKKX 
50 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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orf7ng-l KMDGTGLSQFSHDLTEHNAAVRKYILKKX 

280 290 

In addition, ORF7ng-l (SEQ ID NO: 38) shows significant homology with a hypothetical Exoli 
protein (SEQ ID NO: 1114): 

sp | P283 06 | YCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi'j 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
has 97 additional C-terminal residues [Escherichia coli] Length = 340 



Score 


= 79 


(36.2 bits), Expect = 5.0e-57, Sum P{2) = 5.0e-57 




Identities = 


= 20/87 (22%), Positives = 40/87 (45%) 




Query : 


10 


G I S S VGRKIjAEDR I VFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVS AWD I LQKMRGGRPD 


69 






G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 




Sbjct : 


49 


GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 


108 


Query : 


70 


SVTVQIIEGSRFSHMRKVIDATPDIGH 96 








++++EG R S K + P I H 




Sbjct: 


109 


QFPLRLVEGMRLSDYLKQLREAP Y I KH 13 5 




Score 


= 438 


(200.7 bits), Expect = 5.0e-57, Sum P{2) = 5.0e-57 




Identities = 


= 84/155 (54%), Positives = 111/155 (71%) 




Query: 


120 


EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 


179 






EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS+IEK 




Sbjct: 


158 


EGWFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLP YKDKNQLVTMAS 1 1 EK 


217 


Query : 


180 


ETGHEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 


239 






ET ++RD VASVF+NRL+ IGMRLQTDP+VI YGMG Y GK+ +ADL T YNTYT 




Sbjct: 


218 


ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 


277 


Query: 


240 


GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 274 








GLPP IA PG ++ AAAHP+ YLYFV+ G 




Sbjct: 


278 


GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 





Based on this analysis, including the fact that the H.influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 39): 



1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AGATATTTAC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

3 01 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 

3 51 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 
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401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 
451 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence (SEQ ID NO: 40; ORF9): 



1 . . RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 

51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 

101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 

151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence (SEQ ID NO: 41): 



1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC GGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

4 01 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

4 51 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

701 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

751 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 

901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAAT ■ ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGGG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

13 01 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

13 51 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

14 01 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 
14 51 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 
1501 CTGAGCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 
1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 
1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 
1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 
1701 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 
1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 
1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 42; ORF9-1): 



1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 
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251 TARKYPEILD GFFEQTDTQN LSAWQEMEI MNLVSLHRLD DAYARLNVLL 

3 01 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 
351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

4 01 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 
5 4 51 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

10 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF9 (SEQ ID NO: 40)shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) (SEQ 
ID NO: 44) from strain A of N. meningitidis: 



10 20 30 40 50 

15 orf 9 .pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEI KNERARLA 

II :|:||:|:|:|lh II MM I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 9a MLPARFT I LS VLAAALLAGQAYAA - - GAADAKPPKEVGKVFRKQQRYS EEE I KNERARLA 

10 20 30 40 50 

60 70 80 90 100 110 

20 or f 9 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

llllllllllllll I ! I M I M , 1 1 1 1 1 M M 1 1 ! M 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 1 II 1 1 1 1 

or 1 9a AVGERVNQI FTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

25 orf 9 .pep EM I YQKWRQ I EP I PGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 

1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M I ! M I 1 1 II MM I 

O r f 9 a EM I YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLE EXLAQADEXQNRRVFLLLAQ 

120 130 140 150 160 170 

orf 9a AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 
30 180 190 200 210 220 230 

The complete length ORF9a nucleotide sequence (SEQ ID NO: 43) is: 

1 ATGTTACCCG CCCGTTTCAC CATTTTATCT GTGCTCGCGG CAGCCCTGCT 

51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 

35 101 AAGTCGGAAA GGTTTTCAGA AAGCAGCAGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAGCGGG TTAATCAGAT 

201 ATTTACGTTG CTGGGANGGG AAACCGCCTT GCAAAAGGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCNCTGAACG CGTTTGAACA 

40 3 51 GGCGGAAATG ATTTATCAGA AATGGCGGCA GATTGAGCCT ATACCGGGTA 

4 01 AGGCGCAAAA ACGGGCGGGG TGGCTGCGGA ACGTGCTGAG GGAAAGAGGA 

4 51 AATCAGCATC TAGACGGACT GGAAGAANTG CTGGCTCAGG CGGACGAANG 

501 ACAGAACCGC AGGGTGTTTT TATTGTTGGC ACAAGCCGCC GTGCAACAGG 

551 ACGGGTTGGC GCAAAAAGCA TCGAAAGCGG TTCGCCGCGC GGCGTTGAGA 

45 601 TATGAACATC TGCCCGAAGC GGCGGTTGCC GATGTGGTGT TCAGCGTACA 

651 GGNACGCGAA AAGGAAAAGG CAATCGGAGC TTTGCAGCGT TTGGCGAAGC 

701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
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801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCACAGGCT GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACGC 

901 AATCCGAATG CAGACCTGTA TATTCAGGCA GCGATATTGG CGGCAAACCG 

951 AAAAGAANGT GCTTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

5 1001 GGGGGACGGG GGAACAGCGG GGCAGGGCGG CAATGACGGC GGCGATGATA 

1051 TATGCCGACC GAAGGGATTA CACCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG TGTGCTGGCG GCTGCGGCGG 

1151 CTGTCGAGTT GGACNGCGGC . AGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

10 12 51 CAAAATACAG ATGTTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAGGCTT 

1301 TGAGGGGGTT GGACAAGATT ATCGAAAAAC CGCCTGCCGG CAGTAATACA 

1351 GAGTTACAGG CAGAGGCATT GGTACAGCGG TCAGTTGTTT ACGATCGGCT 

14 01 TGGCAAGCGG AAAAAAATGA TTTCAGATCT TGAAAGGGCG TTCAGGCTTG 

14 51 CACCCGATAA CGCTCAGATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 

15 1501 GATTCCAAAC GTTTGGACGA AGGCTTCGCC CTGCTTCAGA CGGCATACCA 

1551 AATCAACCCG GACGATACCG CTGTCAACGA CAGCATAGGC TGGGCGTATT 

1601 ACCTGAAANG CGACGCGGAA AGCGCGCTGC CGTATCTGCG GTATTCGTTT 

1651 GAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

20 1751 ACCTTACGGG AGACAAGAAA ATATGGCGGG AAACGCTCAA ACGTCACGGC 

1801 ATCGCATTGC CCCAACCTTC CCGAAAACCT CGGAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 44): 





i 


MLPARFTILS 


VLAAALLAGQ 


AYAAGAADAK 


PPKEVGKVFR 


KQQRYSEEEI 


25 


51 


KNERARLAAV 


GERVNQIFTL 


LGXETALQKG 


QAGTALATYM 


LMLERTKSPE 




101 


VAERALEMAV 


SLNAFEQAEM 


IYQKWRQIEP 


IPGKAQKRAG 


WLRNVLRERG 




151 


NQHLDGLEEX 


LAQADEXQNR 


RVFLLLAQAA 


VQQDGLAQKA 


SKAVRRAALR 




201 


YEHLPEAAVA 


DWFSVQXRE 


KEKAIGALQR 


LAKLDTEILP 


PTLMTLRLTA 




251 


RKYPEILDGF 


FEQTDTQNLS 


AVWQEMEIMN 


LVSLHRLDDA 


YARLNVLLER 


30 


301 


NPNADLYIQA 


AILAANRKEX 


ASVIDGYAEK 


AYGRGTGEQR 


GRAAMTAAMI 




351 


YADRRDYTKV 


RQWLKKVSAP 


EYLFDKGVLA 


AAAAVELDXG 


RAALRQIGRV 




401 


RKLPEQQGRY 


FTADNLSKIQ 


MFALSKLPDK 


REALRGLDKI 


IEKPPAGSNT 




451 


ELQAEALVQR 


SWYDRLGKR 


KKMISDLERA 


FRLAPDNAQI 


MNNLGYSLLS 




501 


DSKRLDEGFA 


LLQTAYQINP 


DDTAVNDSIG 


WAYYLKXDAE 


SALPYLRYSF 


35 


551 


ENDPEPEVAA 


HLGEVLWALG 


ERDQAVDVWT 


QAAHLTGDKK 


IWRETLKRHG 




601 


IALPQPSRKP 


RK* 









ORF9a (SEQ ID NO: 44) and ORF9-1 (SEQ ID NO: 42) show 95.3% identity in 614 aa overlap: 

10 20 30 40 50 

MLPARFT I LSVLAAALLAGQAYAAG - - AADAKP PKEVGKVFRKQQRYSEEE I KNERARLA 

Ml II :|:||:|:|:|||: I |:| | I | I I I II I I I I I I I I I I I I I I I I I I I I 
ML PNRFKMLTVLTATL I AGQVSAAGGGAGDMKQ PKEVGKVFRKQQRYSEEE I KNERARLA 
10 20 30 40 50 60 



40 



orf 9a .pep 



orf 9-1 



60 70 80 90 100 110 

orf 9a . pep AVGERVNQI FTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I MINIUM IMIIIIIIIIIIIIIIIIIMIIIIIIIIMIIIIMIIIIIII 

orf 9-1 AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

70 80 90 100 110 120 



120 130 140 150 160 170 

orf 9a . pep EM I YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 

M M I M M M M U I M M MM I ! 1 1 1 1 1 U I M 1 1 Ml lllllllllll 

orf9-l EM I YQKWRQ I E P I PGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 

130 140 150 160 170 180 
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180 190 200 210 220 230 

orf 9a . pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTEI 

I I I I I I I I M ! M I I I I I II U I I I I I I! I I I I I I I I II I I I I I I I I I i I I II I I I 
orf 9-1 AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
5 190 200 210 220 230 240 

240 250 260 270 280 290 

orf 9a . pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 

1 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I! 1 1 1 1 1 1 1 1 < 1 1 1 1 1 M I 

orf 9-1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
10 250 260 270 280 290 300 

300 310 320 330 340 350 

orf 9a . pep ERN PNADLY I QAA I LAANRKEXAS V I DG YAE KAYGRGTGEQRGRAAMTAAM I YADRRD YT 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I • I I I : I I I I ' I I I I I I I • 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
15 . 310 320 330 340 350 360 

360 370 380 390 400 410 

orf 9a . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 

I 1 1 : 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 III I 1 1 1 llllll IIIIIIMIIIII IMMI 

orf 9-1 KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
20 370 380 390 400 410 420 

420 430 440 450 460 470 

orf 9a . pep IQMFALSKLPDKREALRGLDKI I EKP P AGSNTELQAE ALVQRS WYDRLGKRKKM I SDLE 
I I I : I I I I I I ' I I I I I I I I I I ! I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I 
orf 9-1 IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSWYDRLGKRKKMISDLE 
25 430 440 450 460 470 480 

480 490 500 510 520 530 

orf 9a. pep RAFRLAPDNAQIMNNLGYSLLSDSPCRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

I M : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 :| M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 I 

orf 9-1 RAFRLAPDNAQ I MNNLGYSLLTDS KRLDEGFALLQTAYQ INPDDTAVNDS I GWAYYLKGD 

30 490 500 510 520 530 540 

540 550 560 570 . 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

IIIIMIIIIIIIIII llllll llllll IIIMIIIMIII IIIIIIIMIIIIII 

orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
35 550 560 570 580 590 600 

600 610 
orf 9a .pep HGIALPQPSRKPRKX 

IIIMIIIIIIIIII 
orf 9-1 HGIALPQPSRKPRKX 
40 610 

Homology with a predicted ORF from N. gonorrhoeae 

ORF9 (SEQ ID NO: 40) shows 82.8% identity over a 163aa overlap with a predicted ORF 
(ORF9.ng) (SEQ ID NO: 46) from N. gonorrhoeae: 

Orf 9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

45 ^ || :|:||:|:|:|||: || | | : | = : I I | I I I | : | I : : | I I I I I I I I I I I I 

orf 9ng MIMLPARFTILSVLAAALLAGQAYAA- -GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 
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orf9 LAAVGERWQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 114 

1 I I j t I I I I ^ : Ill I I I 1 I I I I I I I I I t I 1 I I I I I ! I 

or f 9ng LAAVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFE 118 

or f 9 QAEM I YQKWRQ I E P I PGKAQ KRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 166 

Illlllllllllllllhlll IIMIIIhl II III III Ihl 

or f 9ng QAEM I YQKWRQ I EP I PGEAQKPAGWLRNVLKEGGNPHLDRLEE VPAQSDYVHQPM I FLLL 178 

The ORF9ng nucleotide sequence (SEQ ID NO: 45) was predicted to encode a protein having 
including acid sequence (SEQ ID NO: 46): 



1 MIMLPARFTI LSVLAAALLA GQAYAAGA AD VELPKEVGKV LRKHRRYSEE 

51 EIKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EM I YQKWRQ I EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

201 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 
domain. 



Further sequence analysis revealed the complete length ORF9ng DNA sequence (SEQ ID NO: 47): 



1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

2 01 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

2 51 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

3 01 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

3 51 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

4 01 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 
4 51 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 
501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 
551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 
601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 
651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 
701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 
751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 
801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 
851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 
901 AACCCGAATG CAAACCTGTA TATTCAGGCG GCGATATTGG CGGCAAACCG 
951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

12 01 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

12 51 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 
1301 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 

13 51 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

14 01 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 
14 51 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 
1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 
1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 
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1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 

1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 

1701 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

1751 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 

1801 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 48): 



1 MLPARFTILS VLAAALLAGQ AYAAGAA DVE LPKEVGKVLR KHRRYSEEEI 

51 KNERARLAAV GERVNRVFTL LGGETALQKG QAGTALATYM LMLERTKSPE 

10 101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP IPGEAQKPAG WLRNVLKEGG 

151 NQHLDGLKEV LAQSDDVQKR RIFLLLVQAA VQQGGVAQKA SKAVRRAALK 

201 YEHLPEAAVA DAVFGVQGRE KEKAIEALQR LAKLDTEILP PTLMTLRLTA 

251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLRKPDDA YARLNVLLEH 

3 01 NPNANLYIQA AILAANRKEG ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 

15 351 YADRRDYAKV RQWLKKVSAP EYLFDKGVLA AAAAAELDGG RAALRQIGRV 

401 RKLPEQQGRY FTADNLSKIQ MLALSKLPDK REALIGLNNI IAKLSAAGST 

451 EPLAEALAQR SIIYEQFGKR GKMIADLETA LKLTPDNAQI MNNLGYSLLS 

501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKGDAE SALPYLRYSF 

551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLRGDKK IWRETLKRYG 

20 601 IALPEPSRKP RK* 

ORF9ng (SEQ ID NO: 48) and ORF9-1 (SEQ ID NO: 42) show 88.1% identity in 614 aa overlap: 



25 



10 20 30 40 50 60 

orf 9 - 1 . pep MLPNRFKMLTVLTATLI AGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEI KNERARLA 

Ml II :|:||:|:|:||h III hh = I I I I I I h I I : = I I I I I I I I I I I I I I I 
orf 9ng-l MLPARFT I LSVLAAALLAGQAYAAG - -AADVELPKEVGKVLRKHRRYSEEE I KNERARLA 

10 20 30 40 50 



30 



70 80 90 100 110 120 

or f 9 - 1 . pep AVGERVNQ I FTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

I II I I l-l I I I II I I I I I I I I I I I I M I I II I i I I I I I I I I I I I I I I I; I I M I I 
orf 9ng-l AVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 



35 



130 140 150 160 170 180 

orf 9-1 .pep EM I YQKWRQ I EP I PGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 

I II i I I I I I I I I I I : I I lllllllhl Mlll.hllll :h hlhlllhl 
orf 9ng- 1 EMI YQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 

120 130 140 150 160 170 



40 



190 200 210. 220 230 240 

orf 9-1 .pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 

Mill M M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1: 1 M 1 1 II 1 1 1 1 MIMIIIMII 

orf 9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
180 190 200. 210 220 230 



45 



250 260 270 280 290 300 

orf 9-1 .pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 

llllllllllllllll llllll IMIIIMMMI IIIIMII:: I lllllll 
orf 9ng- 1 LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 
240 250 260 270 280 290 



50 



310 320 330 340 350 360 

orf 9-1 .pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

MIIMIIIIIIIMIIIII llllll lllllll 1 1 M I M 1 1 M i I M 1 1 1 

orf 9ng- 1 EHNPNANLYIQAAI LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMI YADRRDYA 
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300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 .pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

] I ! M II II 1 1 Ml 1 1 1 1 1 1 1 1 1 1 h 1 1 1 II I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M ! 1 1 1 1 1 

5 orf 9ng- 1 KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

360 370 380 390 400 410 

430 440 450 460 470 480 

or f 9 - 1 . pep I QMLALS KLPDKREALRGLDKI I EKP PAGSNTELQAEALVQRS WYDRLGKRKKM I SDLE 

I I M I I I I I I II I ! I lh:|| I |:::M I I I I = I I I : = I = = = I I I IIMM 
10 orf 9ng-l IQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLE 

420 430 440 450 460 470 

490 500 510 520 530 540 

orf 9-1 .pep RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
:: h I I M Ml I , I I I I I I : I I I II !l I II I I I I I I I I I I I I i I M I I I I ! I I I I I I 
1 5 orf 9ng- 1 TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 

480 490 500 510 520 530 

550 560 570 580 590 600 

or f 9 - 1 . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

I 1 1 M 1 : 1 1 1 1 1 1 > 1 1 1 1 1 1 1 i 1 1 1 1 1 M > I 1 1 1 1 1 1 MIMMIM 

20 orf 9ng- 1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 

540 550 560 570 580 590 

610 

or f 9 - 1 . pep HGIALPQPSRKPRKX 
: IMMIIMIM 
25 orf9ng-l YGIALPEPSRKPRKX 

600 610 

In addition, ORF9ng (SEQ ID NO: 48) shows significant homology with a hypothetical protein 
(SEQ ID NO: 1115) from P. aeruginosa: 

30 sp|P42 810|YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 

(ORF3) 

)gi|l072999|pir| |S49376 hypothetical protein 3 - Pseudomonas aeruginosa )gi|557259 
(X82071) orf3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect = le-28 
35 Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps = 125/587 (21%) 

Query : 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQAEMIYQKWR 126 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A LA + +A W 
Sbjct: 53 LYSLLVAELAGQRNRFDIALSNYWQAQKTRDPGVSERAFRIAEYLGADQEALDTSLLWA 112 

Query: 127 QIEPI PGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

40 + P +AQ+ A ++ VL G+ H D L A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 232 

+ + KY + + A+ Q + +A+ L+ + 

Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

45 Query: 233 KLDTE I LPPTLMTLRLTARK YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 

E+PL+L + K P+GED++ + + LV + 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 
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Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y+ + + 

Sbjct: 271' DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 

Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 
5 LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ- -VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 388 

Query: 372 YLFDKXXXXXXXXXXXXXXXXXXRQ I GRVRKLPEQQGRYFTADNLS KI QMLALS KLPDKR 431 

Y A L 1 + ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

.10 Query: 4 32 EALIGLNNI IAKLSAAGSTEPLAEALAQRS 1 1 YEQFGKRGKMIADLETALKLTPDNAQIM 4 91 

+A + + + ELL RS + + E+ +M DL + PDNA + 

Sbjct: 409 KAWQAIQEGLKQYP EDL - NLLYTRSMLAEKRNDLAQMEKDLRFVI AREPDNAMAL 462 

Query: 4 92 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 
N LGY+L + R E L+ A+ + +NPDD A+ DS+GW Y +G A YLR + + 
15 Sbjct: 463 NALGYTLADRTTRYGE AREL I LKAHKLNPDDPA I LDSMGW INYRQGKLADAERYLRQALQ 522 

Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 

gi 1 2983399 (AE000710) hypothetical protein (SEQ ID NO: 1116) [Aquifex aeolicus] 
20 Length =545 

Score = 81.5 bits (198) , Expect = le-14 

Identities = 61/198 (30%), Positives = 98/198 (48%), Gaps = 19/198 (9%) 

GRYFTADNL - S KI QMLALS KLPDKREAL I GLNN 1 1 AKLS AAGSTEPLAEALAQ 459 

G Y A L K ++LA PDK+E L + +K + + L + 

25 Sbjct: 335 GNYEDAKRL I EKAKVLA PDKKE I LFLEADYYSKTKQYDKALE ILKKLEKDYPNDSR 390 

--RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS- -DSKRLDEGFALLQ 513 
+I+Y+ G L A++L P+N N LGYSLL +R++E L+ + 



Query: 


408 


Sbjct: 


335 


Query : 


460 


Sbjct: 


391 


Query: 


514 


Sbjct: 


451 


Query : 


573 


Sbjct : 


511 



30 A +. +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 

KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 510 

DQAVDVWTQAAHLRGDKK 590 
++A + + +A L + K 
EEARNYYERALKLLEEGK 52 8 

35 Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 7 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 49): 



40 



1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 
51 CAACCTGCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 
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101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CATCAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

3 01 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CG<|CTGGGCG 
351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

4 01 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 
4 51 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 
501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 
551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 
601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 
651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 
701 GCCCAAGGCG AAGTCGTTTC CTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 50; ORF1 1): 



1 . . NLYAGPQTTS VIANIADNLQ 

51 W AIIVLTIIV KAVLYPLTN A 

101 QAMMQLYTDE KINPLGGCLP 

151 TDLSRADPYY I LP I IMAATM 

201 FFFPAGXVLY WWNNLLTIA 



LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 
SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 
MLLQIPVFIG LYWALFA SVE LRQAPWLGWI 
FAQTYLNPPP TDPMQAKMMK IMPLVFSXXF 
QQWHINRSIE KQRAQGEWS * 



Further sequence analysis revealed the complete DNA sequence (SEQ ID NO: 51): 



1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

2 01 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

401 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

1201 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

13 01 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 
1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACTTAT CTGAACCCGC 
1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCGTTGGTT 
1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 
1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 
1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 
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This corresponds to the amino acid sequence (SEQ ID NO: 52; ORF1 1-1): 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

5 151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFAS PL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

10 4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLY WWNN LLTIAQQWHI NRSIEKQRAQ GEWS* 

Computer analysis of this amino acid sequence gave the following results: 

15 Homology with a 60kDa inner-membrane protein (accession P25754) (SEP ID NO: 1117) of 
Pseudomonas putida 

ORF1 1 (SEQ ID NO: 50) and the 60kDa protein (SEQ ID NO: 1 117) show 58% aa identity in 229 
aa overlap (BLASTp). 

LYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNI IGNWGWAI IVLTIIVK 6 1 
20 LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 



25 



30 



ORFll 


2 


60K 


324 


ORFll 


62 


60K 


384 


ORFll 


122 


60K 


444 


ORFll 


182 


60K 


504 



+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 



L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 



DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

DPMQAKVMKMMPI I FTFFFLWFPAGLVLYWWNNCLS ISQQWYITRRIE 552 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORFll (SEQ ID NO: 50) shows 97.9% identity over a 240aa overlap with an ORF (ORFlla) 
(SEQ ID NO: 54) from strain A of N. meningitidis: 



10 20 30 

35 orf 11 .pep NLYAGPQTTSVIANIADNLQLAKDYGKVHW 

1 1 1 M 1 1 1 1 1 M M 1 1 1 1 i I MINIM 

orf lla IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 
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40 50 - 60 70 80 90 

or f 1 1 . pep FAS PLFWLLNQLHN I IGNWGWAI IVLT I I VKAVL Y P LTNAS YRSMAKMRAAAPKLQ A I KE 

1 1 1 1 ! 1 1 1 1 I i 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 ' I II I M i 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf lla FAS PLFWLLNQLHN I IGNWGWAI IVLT I IVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 

340 . 350 360 370 380 390 



100 110 120 130 140 150 

orf 11 . pep KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M I Ml 

orf lla KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQI PVF I GLYWALFAS VELRQAP WLGW I 

400 410 420 430 440 450 



160 170 180 190 200 210 

orf 11 . pep TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLY 

IMIMIMIIIIIIIMIIIIIMII IIIMIIMIIIIMI Mill 1 1 1 1 III 

orf lla TDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 



220 230 240 

orf 11 . pep WWNNLLTIAQQWHINRSIEKQRAQGEWSX 

I I : II I II I I ' I I I I I I I I I I I I I I I M I 
or f 1 1 a WVINNLLT I AQQWH I NRS I EKQRAQGE WSX 

520 530 540 



The complete length ORF1 la nucleotide sequence (SEQ ID NO: 53) is: 



1 ANGGATTTTA AAAGACTCAC NGNGTTTTTC GCCATCGCAC TGGTGATTAT 

51 GATCGGATNG NAAANGATGT TCCCCACTCC GAAGCCCGTC CCCGCGCCCC 

101 AACAGACGGC ACAACAACAG GCCGTAANCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGNAN CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CNAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAANAA 

3 01 TACACCTACN TCGCCCANTC CGAACTTTTG GACGCGCAGG GCAACAACAT 
351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCAC CTGAAACACG CGGTCTGAAA 
4 51 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 
501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 
551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 
601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 
651 AGTCAGCTTC TCCGACTTGG ACGACGATGC CAANTCCGGN AAATCCGAGG 
701 CCGAATACAT CCGCAAAACC CNGACCGGCT GGCTCGGCAT GATTGAACAC 
751 CACTTCATGT CCACCTGGAT CCTCCAACCC AAAGGCGGAC AAAGCGTTTG 
801 CGCCGCTGGC GACTGCNGTA TNGACATCAA ACGCCGCAAC GACAAGCTGT 
851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CTATCCAAAA CGGTGCGAAA 
901 TCCNAAGCCT CCATCAACCT CTACGCCGGC CCACAGACCA CATCNGTTAT 
951 CGCAAACATC GCCGACAACC TGCAACTGGN CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCCCTC TTTTGGCTTT TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGTTCGATG GCGAAAATGC 

1151 GTGCCGCCGC GCCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

12 01 CGTATGGCGC AGCAACAAGC CATGATGCAG CTTTACACAG ACGAGAAAAT 
1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 
1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

13 51 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCNT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACCTAT CTGAACCCGC 
14 51 CGCCGACCGA CCCGATGCAG GCGAAAATGA TGAAAATCAT GCCTTTGGTT 
1501 NTNTCNNNNA NGTTCTTCNN CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 
1551 GATCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 
1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 
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This encodes a protein having amino acid sequence (SEQ ID NO: 54): 



1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

5 101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGM I EH 

251 HFMSTWILQP KGGQSVCAAG DCXXDIKRRN DKLYSTSVSV PLAAIQNGAK 

3 01 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 
10 351 IGNWGW AIIV LTIIVKAVLY PLTN ASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 
451 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 
501 XSXXFFXFPA GLVLYWVINN LLTIAQQWHI NRSIEKQRAQ GEWS* 



15 ORF1 la (SEQ ID NO: 54) and ORF1 1-1 (SEQ ID NO: 52) show 95.2% identity in 544 aa overlap: 



10 20 30 40 50 60 

orf lla . pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 

III III MM II Mill 1 1 H 1 1 1 1 1 1 1 1 :| 1 1 1 1 M 1 1 1 1 M M Mill II 

orf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 
20 10 20 30 40 50 60 

fc 70 80 90 100 110 120 

orf lla . pep DTVQAVI DEKSGDLRRLTLLKYKATGDXNKP F I LFGDGKXYT YXAXSELLDAQGNN I LKG 

III II II 1 1 Ml 1 1 MM I Ml I MM Mill II III I MINIMI MM 

orf 11-1 DTVQAVI DEKSGDLRRLTLLKYKATGDENKP F I LFGDGKE YTYVAQSELLDAQGNN I LKG 

25 70 80 90 100 110 120 



130 140 150 160 170 180 

orf lla . pep IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFD I ANGSGQTANL 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 M 1 1 1 1 1 1 1 II 1 1 1 M M 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 

orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFD I ANGSGQTANL 

30 130 140 150 160 170 180 



190 200 210 220 230 240 

orf lla. pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 

MM I Mill I Ml MM I II II Ml MM II I MM Ml MM I II I II III II Ml I 

orf 11-1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 
35 190 200 210 220 230 240 



250 260 270 280 290 300 

orf lla . pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 

II I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINIM II IIIIIINIINI IIININ 

orf 11 - 1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQNGAK 
40 250 260 270 280 290 300 



310 320 330 340 350 360 

orf lla. pep SXASINLYAGPQTTS VI AN I ADNLQLXKDYGKVHWFASPLFWLLNQLHNI IGNWGWAIIV 

: I lllllllllllll II I I I II I I I I I I II I II I M I II II M II I I II I 

orf 11-1 AEAS INLYAGPQTTS VI AN I ADNLQLAKDYGKVHWFAS PLFWLLNQLHNI IGNWGWAI IV 

45 310 320 330 340 350 360 



370 380 390 400 410 420 

orf lla . pep LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

N I I II 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 II 1 1 N IN 1 1 N II 1 1 1 1 N 1 1 1 III 1 1 1 II 1 1 

orf 11-1 LTI I VKAVLYPLTNAS YRSMAKMRAAAPKLQAI KEKYGDDRMAQQQAMMQLYTDEKINPL 

50 370 380 390 400 410 420 
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430 440 450 460 470 480 

orf lla.pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 

I I I I I I I I II I I I I I II I I I I I I I I ! I I I I I I I I II I I I I I I I I I I I I i I I I I I I I 
orf 11-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQTY 

5 430 440 450 460 470 480 

490 500 510 520 530 540 

orf lla .pep LNP P P TD PMQ AKMM K I M PLVXSXX F FX F P AGLVL YWV I NNLLT I AQQWH I NRS I E KQRAQ 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i ii : 1 1 1 1 ii 1 1 hi m 1 1 ii 1 1 1 i 1 1 1 1 1 1 1 ii M 

orf 11-1 LNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQ 
10 490 500 510 520 530 540 

orf lla. pep GEWSX 
III 

orf 11-1 GEWSX 

1 5 Homology with a predicted ORF from N gonorrhoeae 

ORF11 (SEQ ID NO: 50) shows 96.3% identity over a 240aa overlap with a predicted ORF 
(ORF1 l.ng) (SEQ ID NO: 56) from N. gonorrhoeae: 



Or f 1 1 NLYAGPQTTSVIANI ADNLQLAKDYGKVHWFASPLFWLLNQLHNI IGNWGWAI I VLT 5 7 

I M 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 Ml I 

20 orf ling MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIVVLT 60 

or f 1 1 1 1 VKAVLYPLTNAS YRSMAKMRAAAPKLQAI KEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M M 1 1 M 1 1 1 1 M M 1 1 1 1 h Ihllllll 

orf ling 1 1 VKAVLYPLTNAS YRSMAKMRAAAPELQT I KEKYGDDRMAQQQAMMQLFEDEE I NPLGG 120 

orf 11 CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 177 

25 1 1 IMM I III MM I III 1 1 II II I II II Mill MM 1 1 III II III II 1 1 II Mill 

orf ling CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 

orf 11 PPPTDPMQAKMMKIMPLVFSXXFFFFPAGXVLYWVVNNLLTIAQQWHINRSIEKQRAQGE 237 

M I M I II I M II 1 1 1 1 1 1 1 MINI Mill 1 1 1 II Ml 1 1 1 1 1 1 1 1 III I II II 

orf ling PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQGE 24 0 

30 orfll WS . 240 

III 

orfllng WS 243 

An ORFllng nucleotide sequence (SEQ ID NO: 55) was predicted to encode a protein having 
35 amino acid sequence (SEQ ID NO: 56): 

1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL TN ASYRSMAK MRAAAPELQT I KEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG , CLP MLLQIPV FIGLYWALFA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMPLVFS 

40 201 VMFFFFPAGL VLYWWNNLL TIAQQWHINR SIEKQRAQGE WS* 
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Further sequence analysis revealed the complete gonococcal DNA sequence (SEQ ID NO: 57) to 
be: 



1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAAATGT TCCCCACCCC GAAACCCGTC CCCGCGCCCC 

101 AACAGGCGGC ACAAAAACAG GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTTAT 

2 01 TGATGAAAAA AGTGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

2 51 CAACCGGCGA CGAAAACAAA CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

■ 3 51 TCTGAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC ACCCTCAACG 

4 01 GCGACACAGT CGAAGTCCGC CTGAGCGCGC CCGAAACCAA CGGACTGAAA 

4 51 ATCGACAAAG TCTATACCTT TACCAAAGAC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCgacTTgg acgACGATGC gaaaTccggc aaATccgagg 

701 ccgaatacaT CCGCAAAACC ccgaccggtt ggctcggcat gattgaacac 

751 cacttcatgt ccacctggat cctccAAcct aaaggcggcc aaaacgtttg 

801 cgcccaggga gactgccgta tcgacattaa aCgccgcaac gacaagctgt 

851 acagcgcaag cgtcagcgtg cctttaaccg ctatcccaac ccgggggcca 

901 aaaccgaaaa tggcggTCAA CCTGTATGCC GGTCCGCAAA CCACATCCGT 

951 TATCGCAAAC ATCGCcgacA ACCTGCAACT GGCAAAAGAC TACGGTAAAG 

1001 TACACTGGTT CGCATCGCCG CTCTTCTGGC TCCTGAACCA ACTGCACAAC 

1051 ATTATCGGCA ACTGGGGCTG GGCAATCGTC GTTTTGACCA TCATCGTCAA 

1101 AGCCGTACTG TATCCATTGA CCAACGcctc ctACCGTTCG ATGGCGAAAA 

1151 TGCGTGccgc cgcacCcaaA CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 

12 01 GACCGTATGG CGCAACAGCA AGCGATGATG CAGCTTTACA AAgacgAGAA 

12 51 AATCAACCCG CTGGGCGGCT GTctgcctat gctgttgCAA ATCCCCGTCT 

13 01 TCATCGGCTT GTACTGGGCA TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 

13 51 CCTTGGCTGG GCTGGATTAC CGACCTCAGC CGCGCCGACC CCTACTACAT 

14 01 CCTGCCCATC ATTATGGCGG CAACGATGTT CGCCCAAACC TATCTGAACC 
14 51 CGCCGCCGAC CGACCCGATG CAGGCGAAAA TGATGAAAAT CATGCCGTTG 
1501 GTTTTCTCCG TCATGTTCTT CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 
1551 GGTGGTCAAC AACCTCCTGA CCATCGCCCA GCAGTGGCAC ATCAACCGCA 
1601 GCATCGAAAA ACAACGCGCC CAAGGCGAAG TCGTTTCCTA A 

This encodes a protein having amino acid sequence (SEQ ID NO: 58; ORF1 lng-1): 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APA TPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 IDKVYTFTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKMAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 

351 IIGNWGW AIV VLTIIVKAVL YPLTN ASYRS MAKMRAAAPK LQTIKEKYGD 

4 01 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

451 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

501 VFSVMFFFFP AGLVLYWWN NLLTIAQQWH INRSIEKQRA QGEWS* 



ORF1 lng-1 (SEQ ID NO: 58) and ORF11-1 (SEQ ID NO: 52) shown 95.1% identity in 546 aa 
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10 20 30 40 50 60 

orf ling- 1 . pep MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQKQAATASAEAALAPATPITVTT 

- Ml I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II i 1 1 1 1 1 1 1 1 II M I' :l hi :M 1 1 1 1 M 1 1 1 1 II 1 1 

orf 11-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf ling- 1 . pep DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 

IIIIIIIIIIIIIMIIIIIIIIIMMMIhll lll.lil 1:1 I MlliMII 

orf 11- 1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
10 70 80 90 100 110 120 

130 140 150 160 170 180 

orf ling- 1 .pep IGFSAPKKQYTLNGDTVEVRLSAPETNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANL 

lllllllllhhll 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i I ■ 1 1 1 I I Ml MIIIIIIIIIMI 

orf 11 - 1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 
15 130 140 150 160 170 180 

190 200 210 220 230 240 

orf ling- 1 . pep SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

I I I I I I I I M I I I I I I II II I I I I I I I I I i I ll I I I I I I I i I I I II I I I I I I I I I I I 
orf 11 - 1 SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 
20 190 200 210 220 230 240 

250 260 270 280 290 300 

orf ling- 1 . pep PTGWLGMIEHHFMSTWILQPKGGQNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGP 

I I I I I I I M I I M I I I II I I ' |:||| hi ! I I I M ; : I i I : I I I I M : 1 I : I 
orf 11-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 
25 250 260 270 280 290 

310 320 330 340 350 360 

orf ling- 1 .pep KPKMAVNLYAGPQTTSVIANIADNLQIAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIV 

I : ::||MIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIMIIh 
orf 11-1 KAEAS INLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNI IGNWGWAI I 

30 300 310 320 330 340 350 

370 380 390 400 410 420 

orf ling- 1 .pep VLTIIVKAVLYPLTNASYRS^4AKMRAAAPKLQTIKEKYGDDR^4AQQQAMMQLYKDEKINP 

IIIIIIIIIMIIIIMM III III lllhllllMIIIIIIIIIIIIII 1 1 1 1 1 1 

orf 11-1 VLTI IVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

35 360 370 380 3 90 400 410 



430 440 450 460 470 480 

orf ling- 1 .pep LGGCLPMLLQ I P VF I GLYWALFAS VELRQAPWLGW I TDLSRADP YY I LP I IMAATMFAQT 

IIIMIIIIIII I'l I III I IIIMIIIIIIIIIIMIIIMIMIIIIIIIM 

orf 11-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPI IMAATMFAQT 

40 420 430 44 0 4 50 4 60 470 

490 500 510 520 530 540 

orf ling- 1 .pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRA 

Mlllllllll I hi IIIIMIMMMIIIIMIIIIIIIIIIIIIIIIIIIIM 

orf 11-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
45 480 490 500 510 520 530 



or f 1 lng - 1 . pep QGEWSX 
lllllll 

orf 11-1 QGEWSX 
50 540 
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In addition, ORFllng-1 (SEQ ID NO: 58) shows significant homology with an inner-membrane 
protein from the database (accession number p25754) (SEQ ID NO: 1117): 

ID 60IM_PSEPU . STANDARD ; PRT; 560 AA. 

AC P25754; 

5 DT 01-MAY-1992 (REL. 22, CREATED) 

DT 01-MAY-1992 (REL . 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL . 32 , " LAST ANNOTATION UPDATE) 

DE 60 KD INNER-MEMBRANE PROTEIN. . . . 

SCORES Initl: 1074 Initn: 1293 Opt: 1103 

10 Smith- Waterman score: 1406; 41.5% identity in 574 aa overlap 

10 20 30 40 

orf ling- 1 .pep MDFKR LTAFFAIALVIMIGW EKMFPT PKPVPAPQQAAQKQ 

. ||:|| ::|: ::: |: = : I = : I I I III : = :|: : 

p25754 MDI KRT I L I AALAWS YVNVLKWNDD YGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 

15 10 20 30 40 50 60 

50 6.0 70 80 90 

orf ling- 1 . pep AATAS AEAALAPAT PIT VTTDTVQAVIDEKSGDLRRLTLLKYKATGDE-NKPF 

: :|:||:: I :• I : : I Ih- : I I :||= -hi II h I II 

p2 5754 VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 

20 70 80 90 100 110 120 

100 110 120 130 140 

orf ling- 1 . pep VLFGDGKEYTYVAQSELLDAQGNNILKGIG- - - FSAPKKQYTL-NGD- - -TVEVRLSAPE 

II :| | :|:||| I ::| : - I -I =hl I =h :|ss::| 
P25754 QLFDNGGERVYLAQSGLTGTDGPDA- RASGRPLYAAEQKS YQLADGQEQLWDLKFS - - - 

25 130 140 150 160 170 

150 160 170 180 190 200 

orf ling- 1 .pep TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQTANLSADYRI VRDHS - EPEGQGYF- THSY 

Ih: I ::| : I : I I : I I 111= I = :: || | :| :: I = I 

P25754 DNGVNYIKRFSFKRGEYDLNVSYLIDNQSGQAWNGNMFAQLKRDASGDPSSSTATGTATY 
30 180 190 200 210 220 230 

210 220 230 240 250 260 

orf ling- 1 . pep VGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 

:| :: = | -lll-hl I- :| - II- -hh^M 1 = 

p2 5 754 LGAALWTAS EP YKKVSMKD I D KGSLKE NVS GGWVAWLQH Y FVT AW I - PAKSD 

35 240 250 * 260 270 280 

270 280 290 300 310 320 

orf ling- 1 . pep QNVCAQGDCRIDI KRRNDKLYSASVSVPLTAI PTRGPKPKMAVNLYAGPQTTSVIANIAD 
:|| :::::: | : s |: ::|: | | = :: II Ih | : ::: 

p25754 NNV VQTRKDSQGNYI IGYTGPVISVPA- GGKVETSALLYAGPKIQSKLKELSP 

40 290 300 310 320 330 

330 340 350 360 370 380 

orf ling- 1 .pep NLQLAKDYGKVHWF-ASPLFWLLNQLHNIIGNWGWAIWLTIIVKAVLYPLTNASYRSMA 

:hh III = II I = I : I I I I : : : I = = : I I I I I = h I I h = = h - : I h IMIIM 
p2 5754 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 

45 340 350 360 370 380 390 



390 400 410 420 430 440 

orf l ing- 1 . pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 



CHIR-0160 (356.001) 



-116- 



PATENT 



• I M = M | | ; • I I * * I I I 1 * • * I I I I • I I I I II I I I I I I I I * I * I * I I I •• I I I * I * 
p2 5 7 5 4 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLP I LVQMPVFLALYWVLL 

400 410 420 430 440 450 

450 460 470 480 490 500 

orf ling- 1 .pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 

Ilhlllll: MINI l|::||||||:MII I III I MUMMI-I 
p25754 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

orf ling- 1. pep S VMF F F F P AGLVL YWWNNLLT I AQQWH I NRS I EKQRAQGE WSX 

: : I - I I I I I I I I I I I I hhllhhl II 
p25754 TFFFLWFPAGLVLYWWNNCLSISQQWYITRRIEAATKKAAA 
520 530 540 550 560 

Based on this analysis, including the homology to an inner-membrane protein from P. putida and 
the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 8 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 59): 



1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

2 01 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

2 51 ACCGTTACGA AGTT.TTTAT CGCGGTACG . ACTGGCAGGC TCAAAATACG 

3 01 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 
3 51 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence (SEQ ID NO: 60; ORF13): 



1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XALLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly (SEQ ID NO: 61): 



1 . .GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG ATTCATATCA 

2 01 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

2 51 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 
301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

3 51 AGGCAACCTT CTTATTATCA CACACCCTTA A 
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5 



This corresponds to the amino acid sequence (SEQ ID NO: 62; ORF13-1): 

1 . . AVLIIELLTG TVYLLWSAA LAGSGIAYGL TGSTPAAVLT XALLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVFY RGTHWQAQNT 
101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF13 (SEQ ID NO: 60) shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) 
(SEQ ID NO: 64) from strain A of N. meningitidis: 

10 10 20 30 40 50 

orf 13 .pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 IM 1 1 1 1 M M M 1 1 1 1 h llllllll I 

orf 13a MTWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

10 20 30 40 50 60 

15 60 70 80 90 100 110 

or f 13 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 

Illllll MM IMMIII MMMMMIIM I II I 1 1 1 Ml 1 1 1 1 1 1 1 1 M I 

orf 13 a VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

70 80 90 100 110 120 

20 120 

orf 13 . pep LIVRKEGNLLI ITHPX 



25 



1 1 1 1 1 1 I I 1 1 

orf 13a LIVRKEGNLLI IAKPX 

130 



The complete length ORF13a nucleotide sequence (SEQ ID NO: 63) is: 



1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

30 151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

35 4 01 AACCTTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 64): 



1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
40 101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL IIAKP* 



ORF13a (SEQ ID NO: 64) and ORF13-1 (SEQ ID NO: 62) show 94.4% identity in 126 aa overlap 
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10 20 30 40 50 60 

orf 13a. pep MT VW F VAAVAVL I I ELLTGT VYLL WS AALAGS G I A YGLTGS T P AAVLTAALLS ALG I WF 

I Illllll IMIM II MM I li I I Mill III II 1 1 II 1 1 I 

orf 13-1 AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

10 20 30 40 50 

'70 80 90 100 110 120 

orf 13a . pep VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

Illllll I I I I'M I I I II I I I I : I I II I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 13 - 1 VHAKTAVRKVETDS YQDLDAGQYVE I LRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

60 70 80 90 100 110 

130 

or f 13 a . pep LI VRKEGNLLI IAKPX 

1111111111-1 
orf 13-1 LI VRKEGNLLI ITHPX 

120 

Homology with a predicted ORF from N. gonorrhoeae 

ORF13 (SEQ ID NO: 60) shows 89.7% identity over a 126aa overlap with a predicted ORF 
(ORFB.ng) (SEQ ID NO: 66) from N. gonorrhoeae: 



orf 13 AVLI I ELLTGTVYLLWS AALAGSG I AYGLTGSTPAAVLTXALLS ALG I XF 5 1 

II IMIIMIMIIMMI llllllllllllllllll I II 1 1 II I I 

orf 13ng MTVWFVAAVAVLI IELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 6 0 

orf 13 VHAKTAVRKVETDS YQDLDAGQYVE I LRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 111 

Illllll llllllllllhhhillhllllllll 1 1 II 1 1 1 1 1 1 1 1 1 HIM II 

O r f 1 3 ng VHAKTAVGKVETDS YQDLDTGKYAE I LRYTGGNRYE VF YRGTHWQAQNTGQE VFE PGTRA 12 0 



orf 13 LIVRKEGNLLI ITHP 126 

I I ! I I I M I I I -I 
orfl3ng LIVRKEGNLLI IANP 135 

The complete length ORF13ng nucleotide sequence (SEQ ID NO: 65) is: 



1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCCTA CGGGCTGACT GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

151 GCACTGCTTT CCGCGCTGGG CATTTGGTTC GTACATGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATACC GGAAAATATG 

251 CCGAAATCCT CCGATACACA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCGCA AAATACGGGG CAGGAAGTGT TTGAACCGGG 

351 AACGCGCGCC CTCATCGTCC GCAAAGAAGG TAACCTTCTT ATCATCGCAA 

4 01 ACCCTTAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 66): 



1 MTVWFVAAVA VL 1 1 ELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAE I LRYT GGNRYEVFYR 
101 GTHWQAQNTG QE VFE PGTRA LIVRKEGNLL I IANP* 
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ORF13ng (SEQ ID NO: 66) shows 91.3% identity in 126 aa overlap with ORF13-1 (SEQ ID NO: 
62): 

10 20 30 40 50 

orf 13-1. pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 

5 | | | | | I I | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 13ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13 - 1 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

10 - IIIIMI Mill III llhhhlllhlllllllllllllllllllllM 

or f 1 3 ng VHAKTAVGKVETDS YQDLDTGKYAE I LRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 

70 80 90 100 110 120 

120 

orf 13-1 .pep LIVRKEGNLLIITHPX 
15 | | | | | | | | | | | | : : | h 

orfl3ng LIVRKEGNLLIIANPX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF13 (SEQ ID NO: 60) and ORF13ng (SEQ ID NO: 66) are likely to be outer membrane 
20 proteins. It is thus predicted that the proteins from N. meningitidis and N .gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis (SEQ ID NO: 67): 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

25 51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

30 301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

3 51 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

4 01 ATGCCGTC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 68; ORF2): 

35 1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 69): 



40 



1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 
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51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

2 01 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

2 51 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT .TCCGACGTTA 

4 01 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 

601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 70; ORF2-1): 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

Further work identified the corresponding gene in strain A of N. meningitidis (SEQ ID NO: 71 ): 



1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

3 01 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 
351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

4 01 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 
451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 
501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 
551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 
601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 
651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 72; ORF2a): 



1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) (SEQ ID NO: 68) shows 97.5% identity 
over a 11 8aa overlap with ORF2a (SEQ ID NO: 72): 



10 20 30 40 50 60 

orf 2 . pep MXD FGLGELVFVGI IALIVL GPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 

I IIIIIIIIIMIMIIIIIIII i 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 1 1 1 1 1 

orf 2a MFD FGLGELVFVGI IALIVL GPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 2 pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 : 

orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 
5 70 80 90 100 110 120 

130 

orf 2 . pep RCGKHP I RRHFRRYAV 

orf 2a DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 
10 130 140 150 160 170 180 

The complete strain B sequence (ORF2-1) (SEQ ID NO: 70) and ORF2a (SEQ ID NO: 72) show 
98.2% identity in 228 aa overlap: 

orf 2a. pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

15 | HIM MM 1 1 MM 1 1 MM III II II llllllllllll II II II II II I MINIM 

orf 2-1 MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

orf 2a .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

I II I 1 1 1 1 II II II I II II II I II 1 1 1 II I II II 1 1 II II II I 1 1 II II II 1 1 II M h I 

orf 2 - 1 KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

20 orf 2a .pep DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

MMM 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 M 1 1 1 1 

orf 2 - 1 DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

orf 2a .pep QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 229 

III IIIMIIIIMIMIIIIIIIMM IMI INI Mill 

25 orf 2-1 QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 22 9 

Further work identified a partial DNA sequence (SEQ ID NO: 73) in N.gonorrhoeae encoding the 
following amino acid sequence (SEQ ID NO: 74; ORF2ng): 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
30 101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence (SEQ ID NO: 75): 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGATT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTT GGTCCAGAAC GCCTGCCCGA AGCCGCCCGC ACTGCCGGAC 

35 101 GGCTTATCGG CAGGCTGCAA CGCTTTGTAG GAAGCGTCAA ACAAGAACTT 

151 GACACTCAAA TCGAACTGGA AGAGCTGAGG AAGGTCAAGC AGGCATTCGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GATACGGATA 

251 TGCAGAACAG TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCc tgccgatttc gGTGTCGATg AAAacggcaa 

40 351 tccccttccc gATACGGCAA ACACCGTATC AGACGGCATT TCCGACGTTA 

4 01 TGCCGTCTGA ACGTTCCGAT ACTtCcgcCG AAACCCTTGG GGACGACAGG 

4 51 CAAACCGGCA GTACAGCCGA ACCTGCGGAA ACCGACAAAG ACCGCGCATG 

501 GCGGGAATAC CTGactgctt ctgccgccgc acctgtcgta Cagagggccg 

551 tcgaagtcag ctaTATCGAT ACTGCTGTTG AAacgcctgT tccgcaCacc 

45 601 acttccctgc gcaAACAGGC AATAAACCGC AAACGCGATT TttgtccgaA 

651 ACACCGCGCC aAACCGAAat tgcgcgtcCG TAAATCATAA 
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This encodes a protein having the amino acid sequence (SEQ ID NO: 76; ORF2ng-l): 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 

51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 

5 101 LPEQRTPADF GVDENGNPLP DTANTVSDGI SDVMPSERSD TSAETLGDDR 

151 QTGSTAEPAE TDKDRAWREY LTASAAAPW QRAVEVSYID TAVETPVPHT 

201 TSLRKQAINR KRDFCPKHRA KPKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) (SEQ ID NO: 68) shows 87.5% identity 
1 0 over a 1 36aa overlap with ORF2ng (SEQ ID NO: 74): 

orf 2 . pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I I I II llh I II II II I II II II I II I hi I II I II I I II II II I Ihl II III I I II 
orf 2ng MFD FGLGEL I FVGI I AL I VLGPERLPEAARTAGRL I GRLQRFVGSVKQELDTQ I ELEELR 60 

orf 2 .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

15 h | I | | | I I | I I I I I I I I I 1 I h-i I I I I I I I I I I I I II I I I I I I I I I I I hi I 

orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDEKGNSLP 120 

orf 2. pep RCGKHP IRRHFRRYAV 136 

I III MINIMI 

orf2ng RYGKHR IRRHFRRYAV 136 

20 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) (SEQ ID NO: 70 & SEQ 
ID NO: 76)show 91.7% identity in 229 aa overlap: 

10 20 30 40 50 60 

orf 2 - 1 . pep MFDFGLGELVFVGI I AL I VLGPERLPEAARTAGRL I GRLQRFVGS VKQEFDTQ I ELEELR 

25 | | | | | | || | : | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | : | | | | I | | | I I 

orf2ng-l MFDFGLGELI FVGI I AL I VLGPERLPEAARTAGRL I GRLQRFVGSVKQELDTQ I ELEELR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2 - 1 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

30 I : I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 

or f 2 ng - 1 KVKQAFEAAAAQVRDSLKETDTDMQNSLHD I SDGLKPWEKLPEQRTPADFGVDENGNPLP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 2 - 1 . pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 

35 I : I I I : I I I I I I I I I I I I I : I I I I I I I : I I M I I I I I I I I : I I I I I I I I I I I I I I I I I 

orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 

130 140 150 160 170 180 

190 200 210 220 229 

orf 2 - 1 . pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 

40 | : I M 1 1 1 1 1 1 1 1! I II 1 1 1 : 1 1 i 1 1 1 1 II II I MM MM 

orf 2ng-l QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 

190 200 210 220 230 
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Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
(SEQ ID NO: 1 1 1 8) of Exoli: 

gnl|PID|el292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
5 Score = 56.6 bits (134), Expect = le-07 

Identities = 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
Sbjct : 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 

10 Query: 61 -KVKQAFEAAAAQVRDSLKETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 DSLKKVEKASLTNLTPELKASMDELRQA 88 

Based on this analysis, it was predicted that ORF2 (SEQ ID NO: 68), ORF2a (SEQ ID NO: 72) and 
15 ORF2ng (SEQ ID NO: 74) are likely to be membrane proteins and so the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF2-1 (SEQ ID NO: 70) (16kDa) was cloned in pET and pGex vectors and expressed in Exoli, 
as described above. The products of protein expression and purification were analyzed by SDS- 
20 PAGE. Figure 3A shows the results of affinity purification of the GST-fusion protein, and Figure 
3B shows the results of expression of the His-fusion in Exoli. Purified GST-fusion protein was 
used to immunise mice, whose sera were used for Western blots (Figure 3C), ELISA (positive 
result), and FACS analysis (Figure 3D). These experiments confirm that ORF37-1 (SEQ ID NO: 4) 
is a surface-exposed protein, and that it is a useful immunogen. 

25 Example 10 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 77): 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGC . TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 

101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

30 151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

251 ATTGATGCAC kGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

35 4 01 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 
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501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 
551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 
601 CATCGACGTA TTCGGAACGA TACGCAACAG AACCGAAATG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 78; ORF15): 



1 MQARLLIPIL FSVFILSACG TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALY I ATMG DQGSGSLTGG RYSIDAXXXG EYINS PAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR. TEM . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 79): 



1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

' 51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

2 01 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

2 51 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 
301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

3 51 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 
4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 
501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 
551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 
601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 
651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 
701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 
751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 
801 AGGAATTAAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 
851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 
901 AGTCATGAGG GGTATGGATA CAGCGATGAA GTAGTGCGAC AACATAGACA 
951 AGGACAACCT TGA 

This corresponds to the amino acid sequence (SEQ ID NO: 80; ORF15-1): 



1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALY I ATMG DQGSGSLTGG RYSIDALIRG EYINS PAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

3 01 SHEGYGYSDE WRQHRQGQP * 

Further work identified the corresponding gene in strain A of N. meningitidis (SEQ ID NO: 81): 



1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 
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451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

701 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 82; ORF15a): 

1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL I KPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 

301 SHEGYGYSDE AVRRHRQGQP * 

The originally-identified partial strain B sequence (ORF15) (SEQ ID NO: 78) shows 98.1 
identity over a 21 3aa overlap with ORF1 5a (SEQ ID NO: 82): 

10 20 30 40 50 60 

orf 15 . pep MQARLL I P I LFS VFI LS A CGTLTG I PSHGGXKRFAVEQELVAAS ARAAVKDMDLQALHGR 

I I I I I I I I I I I I I I I I I I I I ■ I I I I I II I IIIIIIIIIIIIIIIIIMIIIIIII II 
orf 1 5a MQARLL IPILFSVFI LS A CGTLTG I P SHGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15 . pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

I I I ! I II I II II I I I I I M I I I I I I' I I I I I I I I I I I I I I I M I I I I I I I I ■ I 

orf 15a KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

IIMMIilMlillMIIII llllllll IIIMIMMMMIM llllllll I 

orf 15a LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 

orf 15. pep FLRG I D WS PANADTDVF INI DVFGT I RNRTEM • 

I I M I I I I I I I M I I I I I I I I I M II I I I I 
o r f 1 5 a FLRG I D WS PANADTDVF INI DVFGT I RNRTEMHL YNAETLKAQTKLE Y FAVDRTNKKLL 

190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) (SEQ ID NO: 80) and ORF15a (SEQ ID NO: 82) show 
98.8% identity in 320 aa overlap: 

10 20 30 40 50 60 

orf 15a . pep MQARLLI PI LFS VFILSACGTLTG I PSHGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 II I II 1 1 II 1 1 1 1 1 M 1 1 1 1 i 1 1 M 1 1 1 M 1 1 
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orf 15-1 MQARLL I PILFSVFI LS ACGTLTG I PS HGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15a. pep KVAL Y I ATMGDQGSGS LTGGRYS I DAL I RGE Y INS PAVRTDYTYPRYETTAETTSGGLTG 

M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I M I I I I I I I M 
orf 15-1 KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINS PAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15a . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I 
or f 1 5 - 1 LTTS LS TLNAPALSRTQSDGSGS KS S LGLN IGGMGD YRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15a . pep FLRG I DWS PANADTDVF INIDVFGT I RNRTEMHL YNAETLKAQTKLE YFAVDRTNKKLL 

llllllllllllllllll IIIMIIIIIMIIIIMM I llllllllllllllll 

orf 15-1 FLRG I DWS PANADTDVF I NIDVFGT I RNRTEMHLYNAETLKAQTKLE YFAVDRTNKKLL 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 15a . pep IKPKTNAFELAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M ! M I I I lllllllllll 
orf 15-1 I KPKTNAFEAAYKENYALWMGP YKVS KG I KPTEGLMVDFSD I RP YGNHTGNS APS VEADN 

250 260 270 280 290 300 



310 320 
orf 15a . pep SHEGYGYSDEAVRRHRQGQPX 

Illllllllhlhlllllll 
or f 15 - 1 SHEGYGYSDEWRQHRQGQPX 

310 320 



Further work identified the corresponding gene in N. gonorrhoeae (SEQ ID NO: 83): 



1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

2 01 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

2 51 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

3 01 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 
351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 
4 51 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 
501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 
551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 
601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 
651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 
701 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 
751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 
801 AGGAATCAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 
851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 
901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 
951 AGGGCAACCT TGA 



This encodes a protein having amino acid sequence (SEQ ID NO: 84; ORF15ng): 



CHIR-0160 (356.001) 



-127- 



PATENT 



1 MRARLLIPIL FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

5 201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * ' 

The originally-identified partial strain B sequence (ORF15) (SEQ ID NO: 78) shows 97.2% 
10 identity over a 213aa overlap with ORFlSng (SEQ ID NO: 84): 

orf 15 . pep MQARLL I P I LFS VF I LS ACGTLTG I PSHGGXKRFAVEQELVAAS ARAAVKDMDLQALHGR 60 

|: I I I I I I I I M II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I ' I I I I I I I I 

o r f 1 5 ng MRARLL IPILFSVFI LS ACGTLTG I PSHGGGKR FAVEQELVAAS ARAAVKDMDLQALHGR 6 0 

orf 15 .pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 12 0 

15 ' I I I I I I I I I II II II I I I I I I I M III llllllll IMIIIIIIIMMMIl 

orf 15ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 

orf 15 . pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

Mill llllllll MMMMMMIMIMIMMMI MMMIMIMIMM 

orf 15ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 180 

20 orf 15. pep FLRGIDWSPANADTDVFINIDVFGTIRNRTEM 213 

IMMMMIMM MIMIII lllllll 
orf 15ng FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 24 0 

The complete strain B sequence (ORF15-1) (SEQ ID NO: 80) and ORFlSng (SEQ ID NO: 84) 
25 show 98.8% identity in 320 aa overlap: 

10 20 30 40 50 60 

orf 15-1 .pep MQARLL I P I LFS VF I LS ACGTLTG I PSHGGGKR FAVEQELVAAS ARAAVKDMDLQALHGR 
I : I I I I II I I I' I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I II I I I I I ■ I I M 
or f 1 5 ng MRARLL I P I L FS VF I LS ACGTLTG I PS HGGGKRFAVEQELVAAS ARAAVKDMDLQALHGR 

30 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 15- 1 . pep KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

I llllllll MINI Mill I III II III Ml I II II I II I lllllll I II Mil I III! 

orf 15ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
35 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 15-1 .pep . LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

I I I I M I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I II II I I I I M I I I I I I II I I II 
orf 15ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
40 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15-1 .pep FLRG I DWS PANADTDVF IN I DVFGT I RNRTEMHLYNAETLKAQTKLEYF AVDRTNKKLL 

MMMMMMMMMMIMMIMMIMM llllllll llllllll MM 

orf 15ng FLRGIDWS PANADTDVF IN I DVFGT I RNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

45 190 200 210 220 230 240 

250 260 270 280 290 300 
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orf 15-1 .pep IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

1 1 1 1 1 1 1 ! 1 1 1 1 1 1 II : 1 1 1 1 1 1 i I 1 1 1 1 1 1 1 1 1 1 1 1 M I :! I M 1 1 1 M ; M II 1 1 1 

orflSng IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 

310 320 
orf 15-1. pep SHEGYGYSDEWRQHRQGQPX 

IIIIIUMIMI Mil 
orf 15ng SHEGYGYSDEAVRQHRQGQPX 

310 320 

Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
15 raising antibodies. 

ORF15-1 (SEQ ID NO: 80) (31.7kDa) was cloned in pET and pGex vectors and expressed in 
Exoli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 4A shows the results of affinity purification of the GST-fusion protein, and 
Figure 4B shows the results of expression of the His-fusion in E.coli. Purified GST-fusion protein 
20 was used to immunise mice, whose sera were used for Western blot (Figure 4C) and ELISA 
(positive result). These experiments confirm that ORFX-1 is a surface-exposed protein, and that it 
is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 85): 

25 1 . . GG . CAGCACA AAAAACAGGC GGTTGAACGG AAAAACCGTA TTTACGATGA 

51 TGCCGGGTAT GATATTCGGC GTATTCACGG GCGCATTCTC CGCAAAATAT 

101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

2 01 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

30 ,251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

401 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

451 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

35 501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 



This corresponds to the amino acid sequence (SEQ ID NO: 86; ORF17): 
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1 . . GQHKKQAVNG KTVFTMMPGM I FGVFTGAFS AKYIPAFGLQ IFFILFLTAV 

51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 

101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 

151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

. Further work revealed the complete nucleotide sequence (SEQ ID NO: 87): 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 Tc.TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 88; ORF17-1): 



1 MWHWDIILIL LAVGSAAGF I AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTVFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 XFGIMLLLIA GKMLYNLL* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical H. influenzae transmembrane protein HI0902 (accession number 
P44Q7Q) (SEP ID NO: 1119) 



ORF17 (SEQ ID NO: 86) and HI0902 proteins (SEQ ID NO: 1119) show 28% aa identity in 192 aa 
overlap: 



ORF17 3 HKKQAVNG KTVFTMMPGM I FGVFT - GAFS AKY I P AFGLQ I F - - F I LFLTAVAFKTLHTDP 59 

HK + + V + P ++ VF G F + +IF +++L ++ D 

HI0902 72 HKLGNI VWQAVRI LAP VI MLS VF I CGLF I GRLDRE I S AKI FACLWYLATKMVLS I KKD - 130 

ORF17 60 QTASRPLPGLPXLTAVSTL FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP I 119 

Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 

HI0902 131 QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 189 

ORF17 120 ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVXXXXXXXXXXXXXX 179 

+SG S++++G +PE SLG++YLPAV ++A + + LG 

HI0902 190 GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 249 
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ORF17 180 FGIMLLLIAGKM 191 

F + L+ + +A M 
HI0902 250 FALFLIWAINM 261 

Homology with a predicted ORF from N.meninsitidis (strain A) 

5 ORF17 (SEQ ID NO: 86) shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) 
(SEQ ID NO: 90) from strain A of N. meningitidis: 

10 20 30 

GQHKKQAVNGKT VFTMMPGMI FGVFTGA FS 

lllllllh ■ I I I I I I I I M I I M M 
QGLAQHPYAQHLA VGTSFAVMVFTAFSSML GQHKKQAVDWKT VFTMMPGMVFGVFAGA LS 
50 60 70 80 90 100 

40 50 60 70 ' 80 90 

AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 

M 1 1 1 1 1 1 1 1 1 I i 1 1 1 1 1 1 II 1 1 1 1 1 1 M I M III M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 ; I 

AKYIP AFGLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 
110 120 130 140 150 160 

100 110 120 130 140 150 

GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 



orf 17 .pep 
10 orfl7a 

orf 17 .pep 
15 orfl7a 

orf 17 .pep 
20 orf!7a 



160 170 180 190 

orf 17 . pep AVLSAATIAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLLX 

I I I I I I I I I IJ I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
25 orf 17a AVLSAATIAFAPLGVKTAHKLSSAKLKKSFGIMLLLIAGKMLYNLLX 

230 240 250 260 



The complete length ORF17a nucleotide sequence (SEQ ID NO: 89) is: 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

30 51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGGTAT TCGGCGTATT CGCTGGCGCA 

35 301 CTCTCCGCAA AATATATCCC AGCGTTCGGG CTTCAAATTT TCTTCATCCT 

3 51 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 
451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 
501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

40 551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

45 801 GCTTTAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 90): 
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1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTVFTMMP GMVFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

5 201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 SFGIMLLLIA GKMLYNLL* 

ORF17a (SEQ ID NO: 90) and ORF17-1 (SEQ ID NO: 88) show 98.9% identity in 268 aa overlap: 

10 20 30 40 50 60 

10 orf 17a .pep MWHWD 1 1 L I LLAVGSAAGF I AGLFGVGGGTL I VP WLWVLDLQGLAQHP YAQHLAVGTS F 

MM II II I II MM II II 1 1 II Mill II MM II tl I II II I II I Illlllllll II I 

orf 17 - 1 . MWHWD 1 1 L I LLAVGSAAGF I AGLFGVGGGTL I VP WLWVLDLQGLAQHP YAQHLAVGTS F 

10 20 30 40 50 60 

70 80 90 100 110 120 

1 5 orf 17a . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILFLT 

1 1 II 1 1 1 II M 1 1 1 1 1 1 M II U I M 1 1 h 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

or f 1 7 - 1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGM I FGVFTGALS AKY I PAFGLQ I FF I LFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

20 orf 17a . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

1 1 1 1 1 1 1 1 1 1 M II II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 M 1 1 1 1 1 

orf 17-1 AVAFKTLHTDPQTASRPLPGLPGLTAVSTL FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

25 orf 17a . pep IGTSSGLAWP I ALSGA I SYLLNGLN I AGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 

1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 M I 

orf 17 - 1 IGTSSGLAWPIALSGA I SYLLNGLN I AGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 

190 200 210 220 230 240 

250 260 269 

30 orf 17a. pep HKLSS AKLKKS FG IMLLL I AGKMLYNLLX 

Illlllllll I I I I II M I I I I I I I I I I 
orf 17-1 HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

250 260 

Homology with a predicted ORF from N. gonorrhoeae . 

35 ORF17 (SEQ ID NO: 86) shows 93.9% identity over a 196aa overlap with a predicted ORF 
(ORF1 7.ng) (SEQ ID NO: 92) from N. gonorrhoeae: 

or f 1 7 . pep GQHKKQAVNGKTVFTMMPGMI FGVFTGAFS 3 0 

MIIIMM I M: I ' I I I I I I I M M 
or f 1 7ng QGLAQHPYAQHLAVGTSFAVMVFTAFSSMLGQHKKQAVDWKTI FAMMPGMI FGVFAGALS 102 

40 ' orf 17 .pep AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

1 1 1 II 1 1 II II I II 1 1 1 1 1 II 1 1 1 M I Mill II II II Mill II Ml II II II I 

orf 17ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

orf 17 .pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 150 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I h I I I I I I I : I I I I I I II II I I 
45 orf 17ng GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAV 202 
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orf 17 .pep AVLS AAT I AFAPLGVKTAHKLS S AKLKKS FG IMLLL I AGKMLYNLL 196 

Mlllllllli lillllllllll IMMIIIIIIMI Mill 

orfl7ng AVLS AAT I AFAPLGVKT AH KLSSAKLKESFG IMLLL I AGKMLYNLL 268 

An ORF17ng nucleotide sequence (SEQ ID NO: 91) is predicted to encode a protein having amino 
acid sequence (SEQ ID NO: 92): 

1 MWHWDIILIL LAVGSAAGF I AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 



Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 93): 



1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

2 01 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

2 51 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

3 01 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 
351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 
4 51 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 
501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 
551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 
6 01 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 
651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 
701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAGAA 
751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 
801 GCTTTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 94; ORF17ng-l): 



1 MWHWDIILIL LAVGSAAGF I AGLFGVGGGT LIVPWLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV DWKTIFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

2 01 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKE 

251 SFGIMLLLIA GKMLYNLL* 



ORF17ng-l (SEQ ID NO: 94) and ORF17-1 (SEQ ID NO: 88) show 96.6% identity in 268 aa 
overlap: 



10 20 30 40 50 60 

orf 17-1. pep MWHWDI ILI LLAVGS AAGFI AGL FGVGGGTL I VP WLWVLDLQGLAQHP YAQHLAVGTS F 

lllllll MIMMMMMIMMMIMM IMMMIMM MMIMIMM 

orf 17ng-l MWHWDI I LI LLAVGS AAGFI AGL FGVGGGTL I VP WLWVLDLQGLAQHP YAQHLAVGTS F 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 17- 1 . pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGM I FGVFTGALSAKYI PAFGLQI FFI LFLT 

MMMM lillllllllll MMMMMIMMMMMIMI IIIMIMM I 

orf 1 7ng - 1 AVMVFTAFSSMLGQHKKQAVDWKTI FAMMPGM I FGVFAGALSAKY I PAFGLQI FFI LFLT 

70 80 90 100 110 120 
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130 140 150 160 170 180 

or f 17 - 1 . pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

III I III I I I I II II II I I II II II I II I hi II II HIM II II Mill II II II II 
orf 17ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

5 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1 .pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

I I I I I I I I I I I I' I I I I I I I : I I I I I M I I I I I I I I I ' I I I I I I I II I I I I I I I I I I I 
orf 17ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
10 190 200 210 220 230 240 

250 260 269 

orf 17-1 .pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

MM Mill: I II MINIM I Ml 

orf 17ng- 1 HKLSSAKLKESFGIMLLLIAGKMLYNLLX 
15 250 260 

In addition, ORF17ng-l (SEQ ID NO: 94) shows significant homology with a hypothetical 
H.influenzae protein (SEQ ID NO: 1 1 19): 

sp|P44070|Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
20 HI0902 - Haemophilus influenzae (strain Rd KW20) gi | 1573922 (U32772) H. influenzae 

predicted coding region HI0902 [Haemophilus influenzae] Length = 264 
Score = 74 {34.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 
Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGTS FAVMVFTAFS SMLGQHKKQA VDWKT I FAMMPGM I FGVF 97 

25 A+GTSFA +V T S ■ HK + W+" + + P ++ VF 

Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 
Identities = 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 
30 L G SS GIGGG VPFL G +AIG+S+ + +SG S + +V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEGSLGFLYLPAVAVLSAATIAFAPLGVKTAHKLSSAKLKESFGIMLLLIAGKM 263 

PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 

Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 



35 This analysis, including the homology with the hypothetical HJnfluenzae transmembrane protein, 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 12 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 95): 
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1 . . GGAAACGGAT GGCAGGCAGA 

51 CGTCAGTAAT GTATCGATGA 

101 TGCATTATTG CTTTTCGGGA 

151 CTCAAACTTT ATGCGCTGAA 

5 201 GCTGATGGCG GTTGCCTATG 

251 CGTCAACGTT CGGCGGCTCG 

301 TTGATGCAGG TCTCGGTACT 

351 A 



CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 
CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 
ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 
GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 
TCCACCGCTG CGGTATAGAC CGGCAGCCGC 
CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 
GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 



10 This corresponds to the amino acid sequence (SEQ ID NO: 96; ORF1 8): 



1 . . GNGWQADPEH PLLGLFAVSN VSMTLAFVGI CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
101 LMQVSVLVLL LSEIGR* 

15 Further work revealed the complete nucleotide sequence (SEQ ID NO: 97): 



1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

20 201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

3 01 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 
351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 
25 4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 98; ORF1 8-1): 

30 1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 

51 GIWGMTRAAP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

101 FAVSNVSMTL AFVGICALVH YCFSGTVQVF VFAALLKLYA LKPVYWFVLQ 

151 FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 

201 R* ' 



35 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 

ORF18 (SEQ ID NO: 96) shows 98.3% identity over a 116aa overlap with an ORF (ORF18a) 
(SEQ ID NO: 100) from strain A of N. meningitidis: 



40 10 20 30 

orf 18 .pep GNGWQADPEHPLLGLFA VSNVSMTLAFVGI 

I I I I I I I I I I I I! I I I I I M I I I I I I I I 
orf 18a TRA AP LFIPHFYLTLGSIFFFI GHWNRKTDGNGWQADPEHPLLGLFA VSNVSMTLAFVGI 
60 70 80 90 100 110 
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40 50 60 70 80 90 

orf 18 .pep CALVH YCFSGTVQVFVFAALLKLYAL KPVYWFVLQFVLMAVAYVH RCGIDRQPPSTFGGS 

MINIMI Ml 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 M I M 1 : 1 1 1 1 1 1 1 1 II 

orf 18a CALVHYCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 
5 120 130 140 150 160 170 

100 110 
orf 1 8 . pep QLRLG GLTAALMQVSVLVLLLS EIGRX 

l M 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 18a QLRLG GLTAALMQXS VLVLLLS E IGRX 

10 180 190 200 

The complete length ORF18a nucleotide sequence (SEQ ID NO: 99) is: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

15 101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

2 01 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

2 51 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

20 3 51 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

4 51 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

25 601 AGATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 100): 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG ISVLGAKLMP 

51 GIWGMTRAAP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

30 101 FAVSNVSMTL AFVGICALVH YCFSXTVQVF VFAALLKLYA LKPVYWFVLQ 

151 FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS E IG 

201 R* 

ORF18a (SEQ ID NO: 100) and ORF18-1 (SEQ ID NO: 98) show 99.0% identity in 201 aa 
35 overlap: 



10 20 30 40 50 60 

orf 18a . pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 18 - 1 MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
40 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 18a . pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 N 1 1 1 

orf 18-1 LFI PHFYLTLGS I FFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

45 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18a . pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

MM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 1 8 - 1 YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCG I DRQPPS TFGGSQLRLG 

50 130 140 150 160 170 180 
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190 200 
orf 18a . pep GLTAALMQXS VLVLLLS E I GRX 

Illlllll MM lllllll 
orf 18-1 GLTAALMQVSVLVLLLSEIGRX 

190 200 ■ 

Homology with a predicted ORF from N. gonorrhoeae 

ORF18 (SEQ ID NO: 96) shows 93.1% identity over a 116aa overlap with a predicted ORF 
(ORF18.ng) (SEQ ID NO: 102) from N. gonorrhoeae: 

orf 1 8 . pep GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 3 0 

MIMIIMIIIIIIIIIIIIIIIIIIIII 
or f 1 8ng TRAAPLFI PHFYLTLGS I FFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGI 115 

orf 18 .pep CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 

M M M M M M M I M M M M Ml I M 1 1 1 1 1 M 1 1 i 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 M M 

orf 1 8ng CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 175 

orf 18 .pep QLRLGGLTAALMQVSVLVLLLSEIGR 116 

Mill hi lllhl -MM M 

orfl8ng QLRLGVLAAMLMQVAVTAMLLAEIGR 201 

The complete length ORF18ng nucleotide sequence is (SEQ ID NO: 101): 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGt aTGCGGcggt 

51 tttTctgTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTGCGTT GTGGCTCGGC ATCTCGGTTT TAGGGGTAAA GCTGATGCCG 

151 GGGATGTGGG GAATGACCCG CGCCGCGCCT TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGTATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

3.01 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CATTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTATTGA TGGCGGttgC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GTTCGCAGCT GCGACTCGGC GTGTTGGCGG 

551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 

601 AGATGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 102): 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWAS I ALWLG ISVLGVKLMP 
51 GMWGMTRA AP LFIPHFYLTL GSIFFFI GYW NRKTDGNGWQ ADPEHPLLGL 
101 FAVSNVSMTL AFVGICALVH YCFSGTVQVF VFAALLKLYA LKPVYWFVLQ 
151 FVLMAVAYVH RCGIDRQPPS TFGGSQLRLG VLAAMLMQVA VTAMLLA EIG 
201 R* 

This ORF18ng (SEQ ID NO: 102) protein sequence shows 94.0% identity in 201 aa overlap with 
ORF18-1 (SEQ ID NO: 98): 



10 20 30 40 50 60 

orf 18-1 .pep M I LLHLDFLS ALLYAAVFLFL I FRAGMLQWFWAS IMLWLG I S VLGAKLMPG I WGMTRAAP 
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I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I = I I I I M I I 
orf 18ng MI LLHLDFLS ALLYAAVFLFL I FRAGMLQWFWAS I ALWLG IS VLGVKLMPGMWGMTRAAP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 18 - 1 . pep LFI PHFYLTLGS I FFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

1 1 1 1 1 M I Ml I MM Ml 1 1 1 M I M I II 1 1 II 1 1 II 1 1 1 II 1 1 1 1 1 M II 1 1 1 1 1 1 1 

orf 18ng LFI PHFYLTLGS I FFFI GYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVG I CALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18 - 1 . pep YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I II I I M I I I I I I I II I I I I II M I I 
orf 18ng YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

130 140 150 160 170 180 

190 200 
orf 18-1 .pep GLTAALMQVSVLVLLLSE IGRX 

hi MM ::||:||||| 
orf 1 8ng VLAAMLMQVAVTAMLLAE IGRX 

190 200 

Based on this analysis, including the presence of several putative transmembrane domains in 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 13 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 103): 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

2 01 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 104; ORF19): 



1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI I AGGLVD 
51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 
101 GAX... 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 105): 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 

2 01 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

2 51 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 
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301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

13 01 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 
' 1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

14 01 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 
14 51 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 
1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 
1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 
1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 
1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 
1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 
1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 
1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 
1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 
1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 
1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 
2001 GCGCGGCGAA CTCGACACCG TCCGCACCCA CAGCAGCGGA ACACAAAGCC 
2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 
2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 
2151 A 

This corresponds to the amino acid sequence (SEQ ID NO: 106; ORF19-1): 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAV GLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

2 51 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 
301 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

3 51 NDRMGDTRIA ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

4 01 IVEALNL NLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 
4 51 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 
551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 
701 YRAYRQ I PHR QPQNAA* 



Computer analysis of this amino acid sequence gave the following results: 



orf 19 


6 


YHFK 


5 


orf 19 


66 


YHFK 


65 
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Homology with predicted transmenbrane protein YHFK of H. influenzae (accession number 
P44289) (SEP ID NO: 1120) 

ORF19 (SEQ ID NO: 104) and YHFK proteins (SEQ ID NO: 1 120) show 45% aa identity in 97 aa 
overlap: 

LKPLLITSLPVFASVFTAASIWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 
L +I+++PVF +V AA +W +MP +LGI IAGGLVDLDN TGRLKN+ T 

LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 

VALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGA 102 
+ F++SS Q +G + +1+ MT++T FT++GA 
LIAFSISSFIVQLHIGKPIQYIVLMTVLTFIFTMIGA 101 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF19 (SEQ ID NO: 104) shows 92.2% identity over a 102aa overlap with an ORF (ORF19a) 
(SEQ ID NO: 108) from strain A of AT. meningitidis: 

10 20 30 40 50 60 

orf 19 .pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mill 

orf 1 9a MKTPPLKPLLITSLPVFASVFTAAS I VWQLGEPKLAMPFVLGI I AGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 

orf 19 .pep Nil TTVALFTLS S LTAQS TLGTGLP F I LAMTLMTXXFT I LGAX 

I I I :| I I M I M I ^ I I I I I I i I I M I I I I ! I I III:] I 
orf 19a NIIATVALFTLSSLVAQSTLGTGLPF ILAMTLMTFGFTIMGAV GLKYRTFAFGALAVATY 

70 80 90 100 110 120 

orf 1 9a TTLTYTPETYWLTNP FMILCGTVLYSTAI ILF QIILPHRPVQENVANAYEALGSYLEAKA 

130 140 150 160 170 180 

The complete length ORF19a nucleotide sequence (SEQ ID NO: 107) is: 



1 ATGAAAACCC CACCCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTG GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCTGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

2 01 CCTGTTCACC CTCTCCTCAC TTGTCGCGCA AAGCACCCTC GGCACAGGTT 

2 51 TGCCATTCAT CCTCGCCATG ACCCTGATGA CTTTCGGCTT TACCATCATG 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTTAT GATTCTGTGC GGAACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTTCAAGAAA ACGTCGCCAA 

501 CGCCTACGAA GCACTCGGCA GCTACCTCGA AGCCAAAGCC GACTTTTTCG 

551 ATCCCGACGA AGCCGAATGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 
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751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

5 951 CGACAATCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCGGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTTG TCGTTGCCGC CGCCTGCACC 

10 1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

13 51 TACTTTACCC CCTCCGTCGA AACCAAACTC TGGATCGTCA TCGCCAGTAC 

14 01 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGCTTC TCGACATTTT 
15 14 51 TCATCACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG GTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGCGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

20 1701 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

25 1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGGCAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGACAAAT TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 

2151 A 



30 



This encodes a protein having amino acid sequence (SEQ ID NO: 108): 



1 MKTPPLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLVAQSTL GTGLPF ILAM TLMTFGFTIM 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

35 151 LFQIILPHRP VQENVANAYE ALGSYLEAKA DFFDPDEAEW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

2 51 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

3 01 RAIEGCRQSL RLLSDSNDNP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

3 51 NDRMGDTRIA ALETGSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 
40 401 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

4 51 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 
551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

45 • 651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

ORF19a (SEQ ID NO: 108) and ORF19-1 (SEQ ID NO: 106) show 98.3% identity in 716 aa 
overlap: 



50 10 20 30 40 50 60 

or f 19a. pep MKTPPLKPLL I TSLPVFASVFTAAS I VWQLGEPKLAMPFVLGI I AGGLVD LDNRLTGRLK 

1 1 1 1 MM MMM MMMMMMI MMMMIMMMMMMIMMM 

orf 19-1 MKTPLLKPLLITSLPVFASVFTAAS I VWQLGEPKLAMPFVLGI IAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 19a. pep NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMTFGFTIMGAVGLKYRTFAFGALAVATY 

I I :| I II I I I I I I : I I I I I I I I I I I I M I I I I I I M I: I I I I I M I I I I I I Ml I I I 
orf 19-1 NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
5 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19a. pep TTLTYTPETYWLTNPFMILCGTVLYSTAI ILFQI ILPHRPVQENVANAYEALGSYLEAKA 

M M II 1 1 M II 1 1 II 1 1 1 1 II M II M Ml 1 1 1 M II II 1 1 MM I MM I Ml 1 1 II 

orf 19-1 TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
10 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 19a . pep DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

III M 1 1 1 II I II 1 1 II I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M I Ml Ml 1 1 M M 1 1 1 1 

orf 19-1 DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
15 190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19a. pep D I HER I SS AHVDYQEMSEKFKNTD 1 1 FR I HRLLEMQGQACRNTAQALRAS KDYVYSKRLG 

MIMIMI IIIIIIMIIIIIIIIMIII IMIIIIIIIIMIMIIIM Mill 

orf 19-1 D I HER I SS AHVDYQEMSEKFKNTD 1 1 FR I HRLLEMQGQACRNTAQALRAS KDYVYS KRLG 

20 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19a. pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

1 1 1 II 1 1 M 1 1 1 1 II I M 1 1 II I II M 1 1 1 II II M 1 1 1 M 1 1 1 1 MM 1 1 M I II 1 1 

orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
25 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19a . pep ALETGSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

1 1 1 M 1 1 1 M MM II II II 1 1 1 II II I M 1 1 1 1 M I II I II II I II 1 1 II 1 1 II I M 

orf 19-1 ALETSSLKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 
30 ' 370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19a. pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

MMIMM MM MMMMMMIMI MM MM MMMMMMIMM 

orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
35 430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a . pep .STFFITIQALTSLSLAGLDVYAAMPVRI IDTI IGASLAWAAVSYLWPDWKYLTLERTAAL 

MMMMIMM MMMMMMMMIMM MMMMMMMMIMM I 

orf 19-1 STFFITIQALTSLSLAGLDVYAAMPVRI IDTI IGASLAWAAVSYLWPDWKYLTLERTAAL 

40 490 500 510 520 530 540 

550 560 570 580 590 600 

orf 19a . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

M M 1 1 1 1 II 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 II I II II 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II I 

orf 19 - 1 AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
45 550 560 570 580 590 600 

610 620 630 640 650 660 

orf 19a . pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 

IMIIIIIIMM MM MM MMMMIMMMMMMMMMIMIMM 

orf 19-1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
50 610 620 630 640 650 660 
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670 • 680 690 700 710. 

orf 1 9a . pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

IMIIIIIIIIIIIIIIII IMIIII IIMIIIIIIMIIMIIIIIIIIIII MM 

orf 19-1 QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
5 670 680 690 700 710 

Homology with a predicted ORF from N. gonorrhoeae 

ORF19 (SEQ ID NO: 104) shows 95.1% identity over a 102aa overlap with a predicted ORF 
(ORF19.ng) (SEQ ID NO: 1 10) from N. gonorrhoeae: 



orf 19. pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

10 MMMMMMMMMM MMMMIMMMMMMMMMMMI Mill 

orfl9ng MKTPLLKPLLITSLPVFASVFTAAS IVWQLGEPKLAMPFVLGI IAGGLVDLDNRLTGRLK 60 



15 



orf 19. pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGAX 103 

MM MMMMMMMM MMMMMMI MIMI 

orf 19ng NI IATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 120 

An ORF19ng nucleotide sequence (SEQ ID NO: 109) is predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 1 10): 



1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIIATVA LFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

20 101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAI I 

151 LFQIILPHRP VQESVANA YE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

25 3 51 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 111): 



1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

30 101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCG CCACCGTCGC 

201 CCTGTTTACC CTCTCCTCGC TCACGGCGCA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT TACCATTTTA 

301 GGCGCGGTCG GGCTGAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

35 3 51 CGCCACCTAC ACCACGCTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCATC 

4 51 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 TGCCTACGAA GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

40 601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGTTTG CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

701 GCTACTACTT CGCCGCCCAA GACATCCACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCGC CGCCTGCTCG AAATGCAGGG GCAGGCGTGC CGCAACACCG 

45 851 CCCAAGCCAT CCGGTCGGGC AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

901 CGCGCCATcg aaggctgCCG CCAGTCGCtg cgcctCCTTt cagacggcaA 

951 CGACAGTCCC GACATCCGCC ACCTGAGccg CCTTCTCGAC AACCTCGgca 

1001 GCGTcgacca gcagtTCcgc caactCCGAC ACAgcgactC CCCCGCcgaa 
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1051 Aacgaccgca tgggcgacaC CCGCATCGCC GCCCtcgaaa ccggcagctT 

1101 caaaaaCAcc tggcaggCAA TCCGTCCGCa gctgaaCCTC GAATCatgCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCgaag cCCTCAACCT CAACCTCGGC TACTGGATAC TGCTGACCGC 

5 1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTGTACC 

13 01 AACGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CCTCCGTCGA AACCAAACTC TGGATTGTCA TCGCCGGTAC 

1401 CACCCTGTTC TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATCACCAT TCAGGCACTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

10 1501 TACGCCGCCA TGCCCGTGCG CATCATcgaC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCGGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAGCGGCAC ATACCTCCAA 

1651 AAAATTGCCG AACGCCTCAA AACCGGCGAA ACCGGCGACG ACATAGAATA 

1701 CCGCATCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

15 1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTT GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG ACATGGGACC CGACGACTTT CAGACGGCAT TGGATACACT 

20 2 001 GCGCGGCGAA CTCGGCACCC TCCGCACCCG CAGCAGCGGA ACACAAAGCC 

2 051 ACATCCTCCT CCAACAGCTC CAACTCATCG CccgGCAACT CGAACCCTAC 

2101 TACCGCGCCT ACCGACAAAT TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 

2151 A 

25 This corresponds to the amino acid sequence (SEQ ID NO: 1 12; ORF19ng-l): 

1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAI I 

151 LFQIILPHRP VQESVANAYE ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

30 ■ 201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

2 51 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT WQAIRPQLNL ESCVFRHAVR LSLWAAACT 

401 IVEALNLN LG YWILLTALFV CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 

35 4 51 YFTPSVETKL WIVIAGTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRI ID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 

551 KIAERLKTGE TGDDIEYRIT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPDMGPDDF QTALDTLRGE LGTLRTRSSG TQSHILLQQL QLIARQLEPY 

40 701 YRAYRQIPHR' QPQNAA* 

ORF19ng-l (SEQ ID NO: 112) and ORF19-1 (SEQ ID NO: 106) show 95.5% identity in 716 aa 
overlap: 



10 20 30 40 50 60 

45 orf 19-1 .pep MKTPLLKPLL ITS LP VFASVFTAAS I VWQLGEPKLAMP FVLG I I AGGLVD LDNRLTGRLK 

; I I : U i 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 II 

orf 19ng-l MKTPLLKPLL ITS LP VFASVFTAAS I VWQLGEPKLAMP FVLG I I AGGLVDLDNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

50 orf 19-1. pep Nil TTVALFTLS SLTAQSTLGTGLP F I LAMTLMTFGFT I LGAVGLKYRTFAFGALAVATY 

I : I i I . I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I M M I I I 
orf 19ng-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 19-1. pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 

1 1 1 1 1 1 , 1 1 1 1 1 1 II M i 1 1 M I ! M I hll ihl 1 1 i II M I II 1 1 hi 1 1 1 M M I 

orf 19ng- 1 TTLTYTPETYWLTNPFMILCGTVLYSTAI ILFQI ILPHRPVQESVANAYEALGGYLEAKA 

130' 140 150 160 ' 170 180 



10 



190 200 210 220 230 240 

orf 19-1. pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I M I I I I I I I I 
orf 19ng-l DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 



15 



250 260 270 280 290 300 

orf 19-1 .pep D I HER I S S AHVD YQEMS E KF KNTD 1 1 FR I HRLLEMQGQACRNTAQALRAS KD YVYS KRLG 

h U I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I Ih h 1 1 1 1 1 h 1 1 h 1 1 1 hi- 1 1 1 M 1 1 1 1 

orf 19ng-l DIHERISSAHVDYQEMSEKFKNTDI I FRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 



20 



310 320 330 340 350 360 

orf 19-1 .pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

I I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I I I I III II I : I = I I I I I I I I I I I I 
orf 19ng-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 



25 



370 380 390 400 410 420 

orf 19-1 .pep ALETSSLKNTWQAIRPQLNLESGVFRHAWLSLWAAACTIVEALNLNLGYWILLTALFV 

I hhlllllllllllll 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 I M 1 1 1 1 1 1 1 1 1 

orf 19ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 



30 



430 440 450 460 470 480 

orf 19-1 .pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 

llllllllllll I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I h I I I I I I I I I I I 
orf 19ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 

430 . 440 450 460 470 480 



35 



490 500 510 520 530 540 

orf 19 - 1 . pep STFFITIQALTSLSLAGLDVYAAMPVRI IDTI IGASLAWAAVSYLWPDWKYLTLERTAAL 

M III II I II lllllll Mill I MIMIIMIII IIIMMI hllMIIIII II II 

orf 19ng-l STFFITIQALTSLSLAGLDVYAAMPVRI IDTI IGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 



40 



550 560 570 580 590 600 

orf 19-1 .pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

M I I : I : I h I I : I II h I I I I hi I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 19ng-l AVCSSGTYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

550 560 570 580 -590 600 



45 



610 620 630 640 650 660 

orf 19 - 1 . pep PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
II I I I I I I I h I I I I I h I I h I I I I I I I I I I I I I h I I I II I I I I h h h I I I I 
orf 19ng- 1 PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 

610 620 630 640 650 660 



50 



670 680 690 700 710 

orf 19-1 .pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

lllllllllll II I h h I I I I I I I I hi I I II I I I I I I h h I h I I I II I I I I 
orf 19ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 
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In addition, ORF19ng-l (SEQ ID NO: 112) shows significant homology to a hypothetical 
gonococcal protein (SEQ ID NO: 1121) previously entered in the databases: 

sp | 033369 | YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PID | ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length = 417 
5 Score = 1512 (705.6 bits), Expect = 5.36-203, P = 5.3e-203 

Identities = 301/326 (92%), Positives = 306/326 (93%) 

Query: 307 RQSLRLLSDGNDSPD I RHLSRLLDNLGSVDQQFRQLRHSDS PAENDRMGDTRI AALETGS 366 

RQSLRLLSDGNDS D I RHLSRLLDNLGS VDQQFRQLRHSDS PAENDRMGDTRI AALETGS 
Sbjct: 1 RQSLRLLSDGNDSXD I RHLSRLLDNLGS VDQQFRQLRHSDS PAENDRMGDTRI AALETGS 60 

10 Query: 367 FKNTWQA I RPQLNLES C VFRHAVRLS L WAAACT I VEALNLNLGYWI LLTALFVCQPNYT 426 

FKNTWQA I RPQLNLES VFRHAVRLS LWAAACT I VEALNLNLGYW I LLT LFVCQPNYT 
Sbjct: 61 FKNTWQA I RPQLNLESGVFRHAVRLS LWAAACT I VEALNLNLGYW I LLTRLFVCQPNYT 12 0 

Query: 427 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 486 
ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 
15 Sbjct: 121 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 180 

Query: 4 87 IQALTSLSLAGLDVYAAMPVRI IDT I IGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 54 6 

I QALTS LSLAGLDVYAAMP VRI IDTI I GAS LAWAAVS YLWPDWKYLTLERTAALAVCS SG 
Sbjct: 181 IQALTSLSLAGLDVYAAMPVRI IDTI I GAS LAWAAVS YLWPDWKYLTLERTAALAVCSSG 240 

Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 
20 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 

Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 



25 



Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG + + + + P 
Sbjct: 301 KP ATALTGY I S ALGHTAAKCTKNAAP 326 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein (SEQ ID NO: 1120), it is predicted that the proteins from N. meningitidis 
and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
30 raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 113): 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

35 51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT .TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

2 51 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 
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301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

401 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

451 ACTCAATTCT TATCATAAGT TCGGCATTCC GGCGTTTACG CCAC.GTTTC 

501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 

551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

701 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

751 CACGATTTTC GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

801 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 

851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 

901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 

951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAmGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCaj|TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

13 01 TATTTACCAA CCTGG . CAAG GGTTGGGCAG CGTTCTT . AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 114; ORF20): 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

401 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 

451 SRSP* 

These sequences were elaborated, and the complete DNA sequence (SEQ ID NO: 1 15) is 



1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

701 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 
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1051 CAACACGCGC TGATTGCCTA 

1101 TAAAGTGTTG GCACCCGGCT 

1151 TCAAAATCGC CATCTTCACG 

1201 TTTATCGGCC CACTGAAACA 

1251 CGCGTGTATC AATGCCGGAT 

1301 TTTACCAACC TGGCAAGGGT 

1351 TCGCTCGCCG TGATGTGCGG 

1401 GTTTGAATGG GCGCACGCCG 

1451 TCCTGATTGC CGTCGGCGGC 

1501 GGCTTCCGTC CGCGCCATTT 



TTCTTTCGGT TTAATCGGCT TAATCATGAT 
TCTATGCGCG GCAAAACATC AAAACGCCCG 
CTCATCTGCA CGCAGTTGAT GAACCTTGCC 
CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 
TGTTGTTTTA CCTGTTGCGC AGACACGGTA 
TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 
CGGACTGTGG GCAGCGCAGG CTTACCTGCC 
GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 
GGACTGTATT TCGCATCACT GGCGGCTTTG 
CAAACGCGTG GAAAACTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 116; ORF20-1): 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISL5SFVGSV 

151 LNSYHKFGIP AFTPTFLNVS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QHALIAYSFG LIGLIMIKVL APGFYARQNI KTPVKIAIFT LICTQLMNLA 

4 01 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG W AAFLAKMLL 

4 51 SLAVMCGGL W AAQAYLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor of S. tvphimurium (accession number P37169) (SEP ID 
NO: 1122) 



ORF20 (SEQ ID NO: 1 14) and MviN proteins (SEQ ID NO: 1 122) show 63% aa identity in 440aa 
overlap: 



Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+ FAEGAF 
MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Orf20 61 AQAFVP I LAE YKETRS KEAXEAF I RHVAGMLS FVLVI VTALG I LAAPWV I YVS APS FAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
MviN . 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLAI^VVTVAGMLAAPWVIMVTAPGFADT 133 

Orf20 121 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGILQLXFQLPWIAKIiGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
MviN 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMGPAILGV 253 

Orf20 241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 
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Orf 20 


301 


EQFSALLDWGLRLCMLLTLPAAVGLAVLSF PLVATLFMYRXFlbrDAyMiyHALlAibr (j 


360 






. . . T i tm/.t/-«t nT P TT T n i J\U T . T i HT i T U V TT 1 DH 7\ MTTl TIT T JiVC 

+ + + L+DWGLRLC LL LP+AV Li +L+ PL + Lr I r 1 rUA Mly AJjJLAxb \j 




MviN 


314 


DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 


373 


Orf20 


361 


L I GL IM I KVLAPGFYARQN IXXPVKI AI FTL I CXQLMNLXFXXXXXXXXXXXXXXXXXC I 


420 






LIGLI ++KVLAPGFY+RQ+ I PVKIAI TLI QLMNL F C+ 




MviN 


374 


LIGLIWCTLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 


433 


0rf20 


421 


NAGLLFYLLRRHGIYQPXQG 44 0 








NA LL++ LR+ 1+ P G 




MviN 


434 


NASLLYWQLRKQNI FTPQPG 453 





10 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF20 (SEQ ID NO: 114) shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) 
(SEQ ID NO: 118) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf 2 0 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

1 1 1 II h M II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 I II 1 1 II I i 1 1 ! I i 1 1 1 1 1 1 1 1 1 1 II 1 1 

or f 2 0a MNMLGALVKVGS LTMVS RVLGFVRDTV I ARAFGAGMATDAF FVAF KL PNLLRRVFAEGAF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 20 . pep AQAFVP I LAEYKETRSKEAXEAF I RHVAG MLS FVLV I VTALGI LAA PWVI YVS APS FAQD 
20 | | | | | | | | | | | | | | | | | | | : | | | | | | | | | | | | | M | | | | | | | | | | | | | | | | | | | | : | | = | 

orf 2 0a AQAFVP I I^EYKETRSKEATEAF I RHVAG MLS FVLV I VTALGI LAA PWVI YVS APGFAKD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20 . pep ADKFQLSIDLLRIT FPYILLISLSSFVGSVLN SYHKFGIPAFTPX FLNVSFIVFALFFVP 

25 | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | : | | | | | | : | | | | | | | | | | | | | | | 

orf 20a ADKFQLSIDLLRIT FPYILLISLSSFVGSVLN SYHKFSIPAFTPT FLNVSFIVFALFFVP 

130 140 150 160 170 180 

190 200 . 210 220 230 240 

orf 20 . pep YFDPP VTAXAWAVFVGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

30 | | | | | | | | | | | | | | | | || | | | | | | | | | | M | | | | M I I I I I I I I I I I I II I I I I I I I I 

orf 2 0a YFDPP VTAIjAWAVFVGGILQLG FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

190 200" 210 220 230 240 

250 260 270 280 290 300 

orf 20 . pep SVAQVSLVIN TIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

35 TTHTT^TTTTi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 20a SVAQISLVIN TIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 2 0 . pep EQFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQHA LIAYSFG 
40 | | | | | | | | | | | | | | | | | | | | | | | : || | | | | | | | | | | | || | | | | | | | | | || | | | | | | | | 

or f 2 0a EQFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQH ALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 20 . pep LIGLIMIKVLA PGFYARQNIXXPV KIAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 
45 I I I I I I I I I I I I I I I I I I : II II I II Nihil II I I III :|lllllllllll 
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orf20a LIGLIMIKVLA PGFYARQNIKTPV KIAIFTLICTQLMNLAFIG PLKHVGLS LAIGLGACI 

370 380 390 400 410 420 

430 440 450 

orf 2 0 . pep NAGLLFYLLRRHG I YQPXQGLGS VLXQKCCSRS PX 

I I ] I II I II I I M I I M : | : : | : 
orf 2 0a NAGLLFYL LRRHG I YQPGKGWA AFLAKMLLSLAVMGGGL YAAQ I WLP FDWAHAGGMQKAA 

430 440 450 460 470 480 

The complete length ORF20a nucleotide sequence (SEQ ID NO: 117) is: 



1 


ATGAATATGC 


TGGGAGCTTT 


51 


GCGCGTTTTG 


GGATTTGTGC 


101 


CAGGCATGGC 


GACGGATGCG 


151 


CTTCGCCGCG 


TGTTTGCGGA 


201 


TTTGGCGGAA 


TATAAGGAAA 


251 


TCCGCCATGT 


GGCGGGGATG 


301 


CTGGGCATAC 


TTGCCGCGCC 


351 


TGCCAAAGAT 


GCCGACAAAT 


401 


CGTTTCCTTA 


TATCTTATTG 


451 


CTCAATTCCT 


ATCATAAATT 


501 


GAACGTGTCG 


TTTATCGTAT 


551 


CTCCCGTTAC 


CGCGCTGGCT 


601 


CTCGGCTTCC 


AACTGCCCTG 


651 


CAAACTGAGT 


TTCAAAGATG 


701 


CGCCTGCGAT 


TTTGGGCGTG 


751 


ACGATTTTCG 


CGTCTTATCT 


801 


CGCCGACCGC 


ATGATGGAAC 


851 


GTACGATTTT 


GCTGCCGACT 


901 


GAACAGTTTT 


CCGCCCTGCT 


951 


GACGCTGCCG 


GCGGCGGTCG 


1001 


CAACCTTGTT 


TATGTACCGA 


1051 


CAACACGCGC 


TGATTGCCTA 


1101 


TAAAGTGTTG 


GCGCCCGGCT 


1151 


TCAAAATCGC 


CATCTTCACG 


1201 


TTTATCGGCC 


CACTGAAACA 


1251 


CGCGTGTATC 


AATGCCGGAT 


1301 


TTTACCAACC 


TGGCAAGGGT 


1351 


TCGCTCGCCG 


TGATGGGAGG 


1401 


GTTCGACTGG 


GCACACGCCG 


1451 


TCCTGATTGC 


CGTCGGCGGC 


1501 


GGCTTCCGTC 


CGCGCCATTT 



GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 
GCGATACGGT CATTGCGCGC GCATTCGGCG 
TTCTTTGTCG CGTTCAAACT GCCCAACCTG 
GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 
CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 
CTGTCGTTTG TACTGGTCAT CGTTACCGCG 
TTGGGTGATT TATGTTTCCG CACCCGGTTT 
TTCAGCTCTC TATCGATTTG CTGCGGATTA 
ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 
CAGCATTCCT GCGTTTACGC CCACGTTCCT 
TCGCGCTGTT TTTCGTGCCG TATTTCGATC 
TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 
GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 
CGGCGGTCAA CCGCGTGATG AAACAGATGG 
AGCGTGGCGC AGATTTCTTT GGTGATCAAC 
GCAATCGGGC AGCGTTTCAT GGATGTATTA 
TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 
TTGTCCAAAC ACTCGGCAAA CCAAGATACG 
CGACTGGGGT TTGCGCNTGT GCATGCTGCT 
GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 
GAATTCACGC TGTTTGACGC GCAGATGACG 
TTCTTTCGGT TTAATCGGTT TAATCATGAT 
TTTATGCGCG GCAAAACATC AAAACGCCCG 
CTCATTTGCA CGCAGTTGAT GAACCTTGCC 
CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 
TGTTGTTTTA CCTGTTGCGC AGACACGGTA 
TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 
CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 
GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 
GGACTGTATT TCGCATCACT GGCGGCTTTG 
CAAACGCGTG GAAAGCTGA 



This encodes a protein having amino acid sequence (SEQ ID NO: 118): 



1 MNMLGALVKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLVIVTA 

101 LGILAAPWVI YVSAPGFAKD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFSIP AFTPTFLNVS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELPGGVLG AALGTILLPT LSKHSANQDT 

3 01 EQFSALLDWG L RXCMLLTLP AAVGMAVL'S F PLVATLFMYR EFTLFDAQMT 
351 QHALIAYSFG LIGLIMIKVL APGFYARQNI KTPVKIAIFT LICTQLMNLA 

4 01 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 
451 SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA R LFILIAVGG GLYFASLAA L 
501 GFRPRHFKRV ES* 
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ORF20a (SEQ ID NO: 118) and ORF20-1 (SEQ ID NO: 116) show 96.5% identity in 512 aa 
overlap: 

10 20 30 40 50 60 

or f 20a . pep MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

5 MMIMIMMIMIMMMIMM IMMIII IMIIIII Illlllll III 

orf 20 - 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 20a . pep AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLVIVTALGILiAAPWVIYVSAPGFAKD 

10 II II M M 1 1 1 II 1 1 II II M 1 1 1 1 1 II M 1 1 1 III I M I II I II I II II II M M I M 

orf 20-1 AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 20a. pep ADKFQLS IDLLRITFPY I LL I SLSS FVGS VLNSYHKFS I PAFTPTFLNVSFI VFALFFVP 

15 M 1 1 II II II M I I III 1 1 II M II M II I II MM MM II 1 1 II M I II I M I M M I 

orf 20-1 ADKFQLS IDLLRITFPY ILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 20a . pep YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 

20 M 1 1 1 1 1 1 1 M 1 1 1 1 M I II I II 1 1 1 1 1 1 1 1 II 1 1 1 MM 1 1 1 1 II 1 1 1 1 1 II II II M 

orf 20 - 1 YFDPPVTAI^WAVFVGGILQLGFQLPWIAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 20a . pep SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

25 | | | | : | | | M M | | | | | | | | | | | | | | | | | | | | | | | : | | | I I I | I I I I I I I I I I I I I I I I I 

orf 20-1 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 2 0a . pep EQFSALLDWGLRXCMLLTLPAAVGIVIAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

30 | | | | | I | | | | M | | | | | | | | | | | : | | | | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I 

orf 20 - 1 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 20a. pep LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

35 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | II I I I I I I I I I I I I I I I 

orf 20-1 LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 20a. pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMGGGLYAAQIWLPFDWAHAGGMQKAA 

40 | | | | | | | | | | | | | | || | M | | | | | | | | | | | | | | | | | | | : I I I : II I : I I I I I I I : I I : 

orf 20-1 NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

430 440 450 460 470 480 

490 500 510 

orf 20a . pep RLFILIAVGGGLYFASLAALGFRPRHFKRVESX 
45 : | | | | | | | | | | | | | | || | | | | | | | | | | | | | : | 

orf 20-1 QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF20 (SEQ ID NO: 114) shows 92.1% identity over a 454aa overlap with a predicted ORF 
(ORF20ng) (SEQ ID NO: 120) from N. gonorrhoeae: 



10 



15 



20 



25 



orf 20 .pep 
orf 20ng 
orf 2 0 . pep 
orf 2 0ng 
orf 2 0 .pep 
orf 20ng 
orf20 .pep 
orf20ng 
orf 20 .pep 
orf 2 0ng 
orf 2 0 .pep 
orf 20ng 
orf 20 .pep 
orf 20ng 
orf 20 .pep 
orf 20ng 



MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

II M 1 1 1 1 1 ! I !l 1 1 1 1 1 1 1 ! 1 1 1 1 M 1 1 1 1 1 M I M 1 1 1 1 1 1 1 1 : 1 1 1 II 1 1 1 1 II I 

MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

i I I I : I I I I I I I I I I I M I I ' I I I I I I I M I ' i I I I I I ! I i I I I I I I I I I I I II 
SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 

1 1 1 1 Ml , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 ■ 1 1 1 1 1 1 1 1 1 1 1 1 : 1 I I I I I I I I I I I 

EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 



60 



60 



120 



AQAFVP I LAE YKETRS KEAXEAF I RHVAGMLS FVLV I VTALG I LAAPWV I YVS AP S FAQD 

I I I I I I I I 1 1 I I 1 I 1 I I I I : I I I 1 I I I I I I ^ : j I I I 1 i I I I ! 1 I I : I = : I 

AQAFVP I LAE YKETRS KEATEAF I RHVAGMLS FVL I WTALG I LAAPWV I YVS APGFT KD 120 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

I I I I I I I :| I I I I I I I I I I I II I I I I I : I M I I I I I I I M I I : I I : I I I I I I I I I I I 
ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

MM MM MM MM MM M M M M M M M M I M M M M M M M M M M I 

YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 240 



300 



300 



360 



360 



420 



LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 

MMMMMI MMIMI : 1 1 i 1 1 M II I h 1 1 1 M I Ml MMMMMM 

LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 
NAGLLFYLLRRHGI YQPXQGLGSVLXQKCCSRSP 4 54 

Illllhhhillhl MM: h M 1 1 1 1 

NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 454 



30 



An ORF20ng nucleotide sequence (SEQ ID NO: 119) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 120): 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MNMLGALAKV 
LRRVFAEGAF 
LGILAAPWVI 
LNSYHKFGIP 
LGFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QHALIAYSFG 
FIGPLKHAGL 
SRSP* 



GSLTMVSRVL 
AQAFVP I LAE 
YVSAPGFTKD 
AFTPTFLNIS 
LGFLKLPKLN 
SVSWMYYADR 
LRLCMLLTLP 
LIGLIMIKVL 
SLAIGLGACI 



GFVRDTVIAR 
YKETRS KE AT 
ADKFQLSISL 
FIVFALFFVP 
FKDAAVNRVM 
MMELPGGVLG 
AAAGLAVLS F 
ASGFYARQNI 
NAGLLFFLFR 



AFGAGMATDA 
EAFIRHVAGM 
LRITFPYILL 
YFDPPVTALA 
KQMAPAILGV 
AALGTILLPT 
PLVATLFMYR 
KTPVKIAIFT 
KHGIYRPGQG 



FFVAFKLPNL 
LSFVLIWTA 
ISLSSFVGSI 
WAVFVGGILQ 
SVAQISLVIN 
LSKHSANQDT 
EFTLFDAQMT 
LICTQLMNLA 
LGQPSWRKCC 



Further DNA sequence analysis revealed the following DNA sequence (SEQ ID NO: 121): 
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1 ATGAATATGC TTGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

5 201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGAcg gAGGCTTTTA 

251 TCCGCCACGt tgcgggAatg CTGTCGTTTG TGCTGATcgt cGttacCGCG 

301 CTGGGCATAC TTGCCGCgcc tTGGGTGATT TATGTTtccg CgcccGGCTT 

3 51 TACCAAAGAC GCGGACAAGT TCCAACTTTC CATCAGCCTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCTTTGT CTTCTTTTGT CGGCTCGATA 
10 4 51 CTCAATTCCT ACCATAAGTT CGGCATTCCC GCGTTTACGC CCACGTTTTT 

501 AAACATCTCT TTTATCGTAT TCGCACTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTTT TTGTCGGCGG TATTTTGCAG 

601 CTCGGTTTCC AACTGCCGTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAAT TTCAAAGATG CGGCGGTCAA CCGCGTCATG AAACAGATGG 

15 7 01 CGCCTGCGAT TTTGGGCGTG agcgTGGCGC AAATTTCTTT GgttATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTatta 

801 cgCCGACCGC ATGATGGAGc tgcgccGGGG CGTGCTGGGG GCTGCACTCG 

851 GTACAATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 

20 951 GACGCTGCCG GCGGCGGccg GACTGGCGGT ATTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGA GAATTCACGC TGTTTGACGC ACAAATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT TAATTATGAT 

1101 TAAAGTGTTG GCATCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTCGCC 

25 12 01 TTTATCGGTC CGTTGAAACA CGCCGGGCTT TCGCTCGCCA TCGGCCTGGG 

12 51 CGCGTGCATC AACGCCGGAT TGTTGTTCTT CCTGTTGCGC AAACACGGTA 
1301 TTTACCGGCC cggcaggggt tgggcggcgt TCTTGGCGAA AATGCTGCTC 

13 51 GCGCTCGCCG TGATGTGCGG CGGACTGTGG GCGGCGCAGG CTTGCCTGCC 

14 01 GTTCGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 
30 1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCTCT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 

This encodes the following amino acid sequence (SEQ ID NO: 122; ORF20ng-l): 



1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMATDA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAT EAFIRHVAG M LSFVLIWTA 

101 LGILAAPWVI YVSAPGFTKD ADKFQLSISL LRIT FPYILL ISLSSFVGSI 
151 ' LNSYHKFGIP AFTPTFLNIS FIVFALFFVP YFDPPVTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLN FKDAAVNRVM KQ MAPAILGV SVAQISLVI N 

251 TIFASYLQSG SVSWMYYADR MMELRRGVLG AALGTILLPT LSKHSANQDT 

3 01 EQFSALLDWG LR LCMLLTLP AAAGLAVLS F PLVATLFMYR EFTLFDAQMT 
351 ' QHALIAYSFG LIGLIMIKVL ASGFYARQNI KTPVKIAIFT LICTQLMNLA 

401 FIGPLKHAGL S LAIGLGACI NAGLLFFL LR KHGIYRPGRG WA AFLAKMLL 

451 ALAVMCGGLW AAQACLPFEW AHAGGMRKAG Q LCILIAVGG GLYFASLAA L 

501 GFRPRHFKRV ES* 

45 

ORF20ng-l (SEQ ID NO: 122) and ORF20-1 (SEQ ID NO: 116) show 95.7% identity in 512 aa 
overlap: 



10 20 30 40 50 60 

or f 2 0 - 1 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVI ARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 

M 1 1 1 1 1 1 1 1 i 1 1 M 1 1 1 1 1 1 1 M 1 1 '1 1 1 1 1 1 II II 1 1' I II 1 1 1 : 1 II 1 1 1 M 1 1 1 

orf20ng-l MNMLGALACTGSLTMVSRVLGFVRDTVIARAFGAG^4ATDAFFVAFKLPNLLRRVFAEGAF 

10 20 30 40 50 60 



35 
40 



70 80 90 100 110 120 

orf 2 0-1 .pep AQAFVP I LAE YKETRS KEAAEAF I RHVAGMLS FVLVI VTALG I LAAPWVI YVS APGFAQD 
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I i ! I I I I I I I I I I : I I ! [ I I I [ I I :: I I I I I I I I I I I I I I I : : I 

orf 2 Ong- 1 AQAFVP I LAEYKETRS KEATEAF IEHVAGMLSFVLI WTALG I LAAPWVI YVSAPGFTKD 

70 80 90 .100 110 120 

130 140 150 160 170 180 

orf 20-1 .pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVSFIVFALFFVP 

I I I I I II: II ! I M I I I II I II I I I I I hi I I I I I i I I i I I I I I I h I I I I I I I I I 
orf20ng-l ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 

130 140 150 160 170 180 



190 200 210 220 230 240 

1 0 orf 2 0 - 1 . pep YFDPPVTALAWAVFVGGILQLGFQLPWl^KiGFLKIiPKLSFKDAAVNRVMKQMAPAILGV 

I I I I I I I I I I I I I I I I I I I II I I I I I I 11 I I I I I I I I I h I I I I I I I I I I I I I I I I I I I I 
orf 2 Ong- 1 YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 



250 260 270 280 290 300 

15 orf 20-1 .pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

Ml hill MM I MINIM III MINIUM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf20ng-l SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 . 300 



310 320 330 340 350 360 

20 orf 20 - 1 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

I I I I I I I I I I I I I I I I I I I I I hi I I I I I I I I I I I I I I M I I I 1 1 1 1 1 II I I M I I I 
orf 20ng-l EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 

310 320 330 340 350 360 



370 380 390 400 410 420 

LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

MINIUM I I h I I I h I I II I I I I II I h I I I I I I I I h :hll I I I I I I I I 
LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 
370 380 390 400 410 420 

■430 440 450 460 470 480 

NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

I I I I h I I h ■ I I hi hi I I I I I I I II hi I I I I I I I il II I IMMIIMIIII 
NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 
430 440 450 460 470 480 

490 500 510 

QLCILI AVGGGL YFAS LAALGFRPRHFKRVENX 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I h I 
QLC I LI AVGGGL YFAS LAALGFRPRHFKRVESX 
490 500 510 

40 In addition, ORF20ng-l (SEQ ID NO: 122) shows significant homology with a virulence factor 
(SEQ ID NO: 1 1 22) of S. typhimurium: 

sp|P3 716 9|MVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi | 43 8252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl |PID|dl005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 
45 Score = 1573 (750.1 bits), Expect = l.le-220, Sum P{2) = l.le-220 

Identities = 309/467 (66%), Positives = 368/467 (78%) 



25 orf 20-1. pep 

orf 20ng-l 

30 orf 20-1. pep 

orf 20ng-l 

35 orf 20-1. pep 

orf 20ng-l 



Query : 



1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+ FAEGAF 
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10 



15 



20 



Sbjct: 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

Query: 61 AQAFVP I LAEYKETRSKEATEAF I RHVAGMLS FVL I WTALG I LAAPWVI YVS APGFTKD 120 

+QAFVP I LAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 
Sbjct: 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAVVTVAGMLAAPWVIMVTAPGFADT 133 

Query: 121 ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 
Sbjct: 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Query: 181 YFDPPVTAIAWAVFVGGILQLGFQLPWLAKLGFLKLPKXJSTFKDAAWRVMKQMAPAILGV 240 

YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 
Sbjct: 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMGPAILGV 253 

Query: 241 S VAQ I SLVINT I FAS YLQSGS VS WMY YADRMMELRRGVLGAALGT I LLPTLS KHS ANQDT 300 

SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
Sbjct: 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 

Query: 301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 360 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
Sb j Ct : 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 373 

Query: 361 LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++KVLA GFY+RQ+IKTPVKIAI TLI TQLMNLAFIGPLKHAGLSL+ IGL AC+ 
Sbjct: 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

Query: 421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 

NA LL++ LRK 1+ P GW VM L+ +P 

Sbjct: 434 NASLLYWQLRKQNIFTPQPGWMWFLMRLIISVLVMAAVLFGVLHIMP 480 



25 



Score = 70 (33.4 bits), Expect = l.le-220, Sum P(2) = l.le-220 
Identities = 14/41 (34%), Positives = 23/41 (56%) 

Query: 469 EWAHAGGMRKAGQLCILIAVGGGLYFASLAALGFRPRHFKR 509 

EW+ + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 E WS QGSML WRLLRLMA W I AG I AA Y F AALAVLGFKVKE FVR 521 



Based on this analysis, including the homology with a virulence factor (SEQ ID NO: 1 122) from 
S.typhimurium, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their 
30 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 15 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 123): 



35 



40 



1 atGATTAAAA 

51 GCAAGCCGTT 

101 AAGAATATGC 

151 GTcAAAAAAG 

201 GTTTACTGCG 

251 AGCGCGTACT 

301 GAGTTTGAAC 

351 AGTGCGCCGC 



TCAAAAAAGG 
tACGACGGCC 
CGGTATGCGC 
GCCAAGTGCT 
CCGGCTTCAG 
TCAGTCAGTC 
GCTACGCACC 
AACCTGATCC 



TCTAAACCTG 
CGGCCaTTAC 
CCCTCGATGA 
GTTTGAAGAC 
GcAAAATCGC 
GTGATTGCCG 
TGAAGCGCTG 
AATCCGGTTT 



CCCATCGCGG 
CGAAGtCGCG 
AAGTCAAGGA 
AAAAAGAATC 
CGCGATTCAC 
TTGAArGCAA 
GCAAACTTAA 
GTGGACTGCG 



GCAGACCGGA 
TTGCTTGGCG 
AGGCGATGCC 
CGGGCGTGGT 
CGTGGCGAAA 
CGACGAAATC 
GCGGCGAAGA 
CTGCGCACCC 
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4 01 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 
4 51 GTCAATGCGA tGGACACCAA TCCG . . 

This corresponds to the amino acid sequence (SEQ ED NO: 124; ORF22): 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 
101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 125): 



1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACATTCATTT CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

■ 751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

13 01 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 126; ORF22-1): 



1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of N. meningitidis (SEQ ID NO: 127): 



1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 
51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 
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101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

201 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

4 01 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

4 51 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG TAAAATCCAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC ' CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 128; ORF22a): 



1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) (SEQ ID NO: 124) shows 94.2% 
identity over a 158aa overlap with ORF22a (SEQ ID NO: 128): 



10 20 30 40 50 60 

MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

I I I I I I I I I I I I I I I I I : :i I I M I I M I I I II I I I I M IIIMIIIIIIIIIIIII 
MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
10 20 30 40 50 60 

70 80 90 100 110 120 

KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I I I I I : I I I I I I I I I I I I i I i I I I I I I I I II I I I I I M I I I I I I I II I I 
KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
70 80 90 100 110 120 

130 140 150 

NL I QS GLWTALRTRP FS K I PAVDAE P FA I FVNAMDTNP 

I I M I I I I I I M I I I I I I I I I I I I I I I I I I II I I I 
NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 



orf 22 .pep 
orf 22a 

orf 22 .pep 
orf 22a 

orf 22 .pep 
orf 22a 
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130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) (SEQ ID NO: 126) and ORF22a (SEQ ID NO: 128) 
show 94.9% identity in 447 aa overlap: 

10 20 30 40 50 60 

orf 22a . pep M I KI KKGLNLP I AGRPEQV I YDGP VI TEVALLGEE YAGMRPXMKVKEGDAVKKGQVLFED 

I II I IN II I M I I I I I - I ! I h I I I I I ! I I II I I I I I I I I I I I I I I II I I I I i I I 
orf 22 - 1 M I KI KKGLNLP I AGRPEQAVYDGPAI TEVALLGEE YAGMRPSMKVKEGDAVKKGQVLFED 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 22a . pep KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 

II IIIIMIMIIIIIIMIIIIMIMIII IIIMII IIIMIIIIII III! I 

orf 22 - 1 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 22a . pep NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 

I I I i I I I I I I I : I I I I I I I I I I I I I I I I ! I I II I I M I I lh I : I I Ihl II 

orf 22-1 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 

130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf 22a . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

I I . I II 1 1 1 1 1 , 1 1 I 1 1 1 II I M I M M I ! I 1 1 1 1 1 1 : 1 1 I I I I 1 1 II 1 1 I 1 1 1 1 M 

orf 22 - 1 LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 



25 



30 



250 260 270 280 . 290 300 

orf 22a . pep NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADNRVI 

III IhMMMII IIIMIMIIMIMIIIMII IMUII IIIIIIIIIMMM 

orf 22 - 1 NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330- 340 350 360 

orf 22a . pep SGS VLNGA I TQGAHD YLGRYHNQ I S VI EEGRS KELFGWVAPQPDKYS I TRTTLGHFLKNK 

I I I I II I I I I ! I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I ' I I I I I I I I I I I I I 
orf 22-1 SGS VLNGAITQGAHDYLGRYHNQ IS VI EEGRS KELFGWVAPQPDKYS I TRTTLGHFLKNK 

310 320 330 340 350 360 



35 



370 380 390 400 410 420 

or f 22a . pep LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I MM I I I I I I I I I I I I I I I I I I Ml I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 
orf 22 - 1 LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 



40 430 440 

orf 22a . pep LCS FVCPGKYEXGPLLRKVLETXEKEGX 

IIIMIIIIII I I I I I I ' I Mill 
orf 22 - 1 LCSFVCPGKYEYGPLLRKVLETIEKEGX 

430 440 

45 Further work identified a partial gene sequence (SEQ ID NO: 129) from N. gonorrhoeae, which 
encodes the following amino acid sequence (SEQ ID NO: 130; ORF22ng): 
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1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA AD P TV I IKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

2 51 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 

Further work identified complete gonococcal gene (SEQ ID NO: 131): 



1 ATGATTAAAA TCAAAAAAGG TCTAAATCTG CCCATCGCGG GCAGACCGGA - 

51 GCAAGTCATT TATGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGT CGGCATGCGC CCCTCGATGA AAATCAAGGA AGGTGAAGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTAGT 

2 01 ATTTACTGCG CCGGCTTCAG GCAAAATCGC CGCTATTCAC CGTGGCGAAA 

2 51 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 
301 GAGTTCGAAC GCTACGTACC TGAAGCGCTG GCAAAATTGA GCAGCGAAAA 

3 51 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

4 01 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 
4 51 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 
501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 
551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 
601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 
651 TGCCGGCTTG AGTGGCACGC ACATTCATTT CATCGAGCCA GTCGGCGCGA 
701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 
751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 
801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 
851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 
901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 
951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 132; ORF22ng-l): 



1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51" VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVI IKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

3 01 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 
351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

4 01 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) (SEQ ID NO: 124) shows 93.7% 
identity over a 158aa overlap with ORF22ng (SEQ ID NO: 130): 

orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

M I I I I II I I I I I I I I I - I I I I I I M I I I I I I ■ h I I I Ml h M I : I I I I I I I I I 
orf22ng MIKI KKGLNLP I AGRPEQVI YDGPA I TEVALLGEE YVGMRPSMKI KEGEAVKKGQVLFED 6 0 
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orf 22 .pep 
orf 22ng 
orf 22 . pep 
orf 22ng 



KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

I I I I I I I I I I I I ! I I II I I I I I I I I I II II I I I IMMMM MUM MMI 
KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 



NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 

MINI llllll IIIIIIIIIIIIIIMIIMMII 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 



158 



180 



The complete sequences from strain B (ORF22-1) (SEQ ID NO: 126) and gonococcus (ORF22ng- 
1) (SEQ ID NO: 132) show 96.2% identity in 447 aa overlap: 



10 



10 20 30 40 50 60 

orf 22 - 1 . pep MI KI KKGLNLP I AGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

MM MIIMIIIIIIMMII IMMMMIIM MMMMIMMMMIMM 

orf22ng-l MI KI KKGLNLP I AGRPEQVI YDGPAITEVALLGEEYVGMRPSMKI KEGEAVKKGQVLFED 

10 20 30 40 50 60 



15 



70 80 90 100 110 120 

orf 22 - 1 ..pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

MMIMMMMMMIMM MMMMIMM MUM MIIIIMMMM 

orf22ng-l KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 



20 



130 140 150 160 170 180 

orf 22 - 1 . pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 

I M I II I III 1 1 1 1 1 M 1 1 1 M 1 1 1 1 IM 1 1 1 II M 1 1 1 II 1 1 1 1 1 1 1 II II 1 1 1 1 II 

orf22ng-l NL I QSGLWTALRTRPFSKI PA VDAEPFAIFVNAMDTNPLAADPTVI IKEAAEDFKRGLLV 

130 140 150 160 170 180 



25 



190 200 210 220 230 240 

orf 22 - 1 . pep LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

M I M I MM 1 1 II 1 1 ! II 1 1 M M I M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M I II 1 1 1 1 1 II M 

orf 22ng-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 



30 



250 260 27,0 280 290 300 

orf 22 - 1 . pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

IMIIIMI IIMIIMIIIMMII II IMIMMIMMIMMMIMM III 

orf 22ng- 1 NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 

250 260 270 280 290 300 



35 



310 320 330 340 350 360 

orf 22-1 .pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

M 1 1 1 II I M II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 I II 1 1 1 II I 

orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 



40 



370 380 390 400 410 420 

orf 22 - 1 . pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

II MIMI I II II 1 1 Ml II III I III MM M I Mill MUM llllll III 

orf22ng-l LFKFTTAVNGGDRAMVP IGTYERVMPLD I LPTLLLRDL I VGDTDS AQALGCLELDEEDLA 

370 380 390 400 410 420 



45 



430 440 
orf 22-1 .pep LCS FVCPGKYEYGPLLRKVLETI EKEGX 

I I M I I I I M II II I I I II I M II I I I 
orf 22ng- 1 LCSFVCPGKYEYGPLLRKVLETI EKEGX 
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430 440 

Computer analysis of these sequences gave the following results: 

Homology with 48kDa outer membrane protein of Actinobacillus pleuropneumoniae (accession 
5 number U24492) (SEP ID NO: 1 1 23). 

ORF22 (SEQ ID NO: 124) and this 48kDa protein (SEQ ID NO: 1123) show 72% aa identity in 
158aa overlap: 



Orf22 


1 


MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 


60 






MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 




4 8kDa 


1 


M I T I KKGLDLP I AGTPAQV I HNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 


60 


orf22 


61 


KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 


120 






KKNPGWFTAPASG + I +RGEKRVLQS WI VE + + + I F RY LA+LS E+V+ + 




4 8kDa 


61 


KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 


120 


orf22 


121 


NLIQSGLWTALRTRPFSKIPAVDAEPFAT FVNAMDTNP 158 








NLI+SGLWTA RTRPFSK+ PA+DA P + I FVNAMDTNP 




4 8kDa 


121 


NL I ESGLWTAFRTRP FSKVPALDAI PSS I FVNAMDTNP 158 





ORF22a (SEQ ID NO: 1 28) also shows homology to the 48kDa Actinobacillus pleuropneumoniae 
protein (SEQ ID NO: 1123): 

gi | 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
20 Length = 44 9 



Score = 530 bits (1351), Expect = e-150 

Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 



Query : 1 MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 
MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
25 Sbjct: 1 MI TI KKGLDLP IAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



Query: 61 KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGWFTAP SG + I +RGEKRVLQS WI VEG+ + + I F RY LA+LS + 
Sbjct: 61 KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 



30 



Query: 121 NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 180 

NLI+SGLWTA R RPFSK+PA+DA P + I FVNAMDTNPLAADP W+KE DF+ V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 



35 



Query: 181 LSRL- -TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 

L+RL ++ + ++CK A + ++P S I F G HPAGL GTH I HF + + P VGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 

Query: 23 8 WTINYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDADN 297 

W +NYQDVIAIG+LF TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 
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Query: 298 RVISGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 357 

RVISGSVL+GA G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: ,301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 358 KNKLFKFTTAWGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 417 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 
Sbjct: 361 K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 



Query: 418 XXXXXS FVCPGKYEXGPLLRKVLETXEKEG 447 
++VCPGK GP+LR LE EKEG 

ORF22ng-l (SEQ ID NO: 132) also shows homology with the OMP (SEQ ID NO: 1123) from 
1 0 A.pleuropneumoniae : 



gi | 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
pleuropneumoniae] Length =44 9 
Score = 555 bits (1414), Expect = e-157 

Identities = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 

15 Query: 27 MI KI KKGLNLP I AGRPEQVI YDGPAI TEVALLGEEYVGMRPSMKI KEGEAVKKGQVLFED 86 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 
Sbjct: 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 



Query: 87 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 
KKNPGWFTAPASG + I +RGEKRVLQS WI VEG+ + + I F RY LA LS+E+V++ 
20 Sbjct: 61 KKNPGWFTAPASGTWTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 



Query: 147 NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 206 

NLI+SGLWTA RTRPFSK+ PA+DA P + 1 FVNAMDTNPLAADP V++KE DFK GL V 
Sbjct: 121 NLIESGLWTAFRTRPFSKVPALDAI PSS I FVNAMDTNPLAADPEWLKEYETDFKDGLTV 180 



25 



Query: 207 LSRL- -TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 263 

L+RL + + + + +CK A + ++P S I F G HPAGL GTHIHF++PVGA K V 

Sbjct: 181 LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 



30 



35 



Query: 264 WTINYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADN 323 

W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
Sbjct: 241 WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 

Query: 324 RVISGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 3 83 

RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
Sbjct: 301 RVISGSVLSGATAAGPVDYLGRYALQVSVLAEGREKELFGWIMPGSDKFSITRTVLGHFG 360 

Query: 384 KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 443 

K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ " 

Sbjct: 361 K- KLFNFTTAVHGGERAMVPIGAYERVMPLDI IPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

Query: 444 XXXXXS FVCPGKYEYGPLLRKVLET I EKEG 473 

++VCPGK YGP+LR LE I EKEG 
Sbjct: 420 DLALCTYVCPGKNNYGPMLRAALEKIEKEG 449 



Based on this analysis, including the homology with the outer membrane protein (SEQ ID NO: 
40 1 123) of Actinobacillus pleuropneumoniae, it was predicted that these proteins from ^meningitidis 
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and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF22-1 (SEQ ID NO: 126) (35.4kDa) was cloned in pET and pGex vectors and expressed in 
E.coli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 5A shows the results of affinity purification of the GST-fusion protein, and 
Figure 5B shows the results of expression of the His-fusion in Exoli. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
(Figure 5C). These experiments confirm that ORF22-1 (SEQ ID NO: 126) is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 133): 



1 . . GCGnCGnAAA TCATCCATCC CC . . nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 TTACTGAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

3 01 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

3 51 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

4 01 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 
4 51 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 
501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 
551 CGTTCTTAAA AGAAGTCGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 
601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 
651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 
701 ACGCGCCCGA AGTCATTCAA GCCGCTTACC GCATCGGTGA TTCCGTTACC 
751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 
801 GrkCmmmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 
851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 
901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 
951 TCCCGCACCT TAA 



1 . .AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG LIWAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIWFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAP I FVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

3 01 WVFVLGLPVG PGAPTFYPAP * 



Further sequence analysis revealed the complete DNA sequence (SEQ ID NO: 135) to be: 



This corresponds 



to the amino acid sequence (SEQ ID NO: 134; ORF12): 
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1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

5 201 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT' 

10 4 51 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

15 701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

20 951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

25 1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

30 14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 136; ORF 12-1): 

35 1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAV GAYFGLS 

51 VPDPRPVGAK GRADDG LIYI VSLLNADGFI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASE LGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQI IHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

40 251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGS VLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

45 501 CIWVFVLGLP VGPGAPTFYP AP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .meningitidis (strain A) 



ORF12 (SEQ ID NO: 134) shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) 
50 (SEQ ID NO: 1 38) from strain A of N. meningitidis: 



CHIR-0160 (356.001) 



-164- 



PATENT 



10 20 30 

orf 12. pep AXXIIHPXXWGPEANWFFMVASTFVIALI 

I I I I I I MMMIIIMMMIM 
or f 12a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMVASTFVIALI 
180 190 200 210 220 230 



40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIV 

I Mill Ml II II Ml III 1 1 Mill II I II INI Ml 1 1 M Ml lllillll I MM 

orf 12a GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 
240 250 260 270 280 290 



100 110 120 130 140 150 

orf 12 .pep PADG I LRHPETGLVSGS PFLKS I WF I FLLFALPG I VYGRVTRS LRGEQE WNAXAESMS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I 
orf 12a PADG I LRHPETGLVSGS PFLKS I VVF I FLLFALPG I VYGRVTRS LRGEQEVVNAMAESMS 

300 310 320 330 340 350 



160 170 180 190 200 210 

orf 12 . pep TLXLXLXX I FFAAQFVAFFNWTN I GQY I AVKGATFLKEVGLGGS VLF IGF I L I CAF INLM 

II I I Ml MMMMMMMMMMMIIMM MIIMIIIMI llllll 

orf 12a - TLGL YLVI I FFAAQFVAFFNWTN I GQY I AVKGATFLKEVGLGGS VLF I GF I L I CAF INLM 

360 370 380 390 400 410 



220 230 240 250 . 260 270 

orf 12 .pep IGSASAQWAVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVXXY 

Ml Ml Ml II 1 1 Mill IMM III MIMI MUM M 1 1 1 MM IMMMI I 

orf 12a IGSASAQWAVTAPI FVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVIKY 

420 430 440 450 460 470 



280 290 300 310 320 

orf 12 . pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orf 12a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

480 490 500 510 520 



The complete length ORF12a nucleotide sequence (SEQ ID NO: 137) is: 



1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGCCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCTGA CGGTTTGATC AAAATCCTGA 

2 51 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCTCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

4 51 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGA CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CAATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 
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1051 ATGAGTACTC TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

13 51 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 
14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 
1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 
1551 ATTCTATCCC GCACCTTAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 138): 



1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAA GAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TGFAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASE LGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IWFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGSVLFI 

401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGAPTFYP AP* 



ORF12a (SEQ ID NO: 138) and ORF12-1 (SEQ ID NO: 136) show 99.0% identity in 
overlap: 



10 20 30 40 50 60 

orf 12a. pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAAGAYFGLSVPDPRPVGAK 

i 1 1 1 1 1 1 1 : 1 M I , I M 1 1 1 1 1 1 M 1 1 1 1 M 1 1 M I M 1 1 1 h I i 1 1 1 1 1 1 1 1 1 1 M 

orf 12 - 1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FI VLLLI ASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 12a . pep GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

Ml IMMMMMMIMI MM MM IMMMMIIIIMII MIMIIM 

orf 12 - 1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAE KSGLISALMR 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 12a. pep LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 

Mill IMIIIIIIIIIIIIII MM MM MMMMMM MMMMMM 

orf 12-1 LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 12a. pep GGYS ANLFLGT I DPLLAG I TQQAAQ 1 1 HPDYWGPEANWFFMVAS TFV I AL I GYFVTEKI 

MMI MMMMMM MM MM MM MMMMMM MMMMMM 

orf 12-1 GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 



orf 12a .pep 



250 260 270 280 290 300 

VEPQLGPYQSDLSQEEKD I RHSNE I TPLEYKGL I WAGWFVALSALLAWS IVPADGILRH 

Ml III llllllllll IMIIIIIMIIIillllll MINI MIMII MMIIIMS 
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orf 12 - 1 VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGVVFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12a . pep PETGLVSGS PFLKS I WF I FLLFALPG I VYGRVTRSLRGEQE WNAMAESMSTLGLYLVI 

1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 II 

orf 12-1 PETGLVSGS PFLKS I WF I FLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 



10 



370 380 390 400 410 420 

orf 12a . pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

orf 12 - 1 IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 



15 



430 440 450 460 470 480 

orf 12a. pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

I I II I I I I I I I I I I I I I I I I I I I I I M II I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12-1 AVTAP I FVPMLMLAGYAPEVIQAAYR I GDS VTN I ITPMMS YFGL IMATVI KYKKDAGVGT 

430 440 450 460 470 480 



20 



490 500 510 520 

orf 12a .pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

M IMI III IIMMI MIMI1IMIIII IIIMIIIMIM 

orf 12 - 1 LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 500 510 520 



Homology with a predicted ORF from N.sonorrhoeae 



25 



ORF12 (SEQ ID NO: 134) shows 92.5% identity over a 320aa overlap with a predicted ORF 
(ORF12.ng) (SEQ ID NO: 140) from N. gonorrhoeae: 



30 



35 



40 



orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 
orf 12 .pep 
orf 12ng 



AXXI IHPXXWGPEANWFFMVASTFVIALI 

I 1 1 1 1 MINI IIMI. MINI 

AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 



30 



232 



90 



GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 

Mill MMIMMIMMMMMMMMMMMMIMM MUM Mill 

GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 292 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 150 

IIIIIMIIIIIhlll IIIMI IIIIIMIIIII MINI hlllll Mill 
PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 352 

TLXLXLXX I FFAAQ FVAF FNWTN I GQ Y I AVKGAT FLKEVGLGGS VLF I GF I L I CAFINLM 210 

II I I II 1 1 1 1 1 1 1 1 1 1 1 M Ml I II I MM I M 1 1 1 II 1 1 1 M M M I M M I 

TLGLYLV I I FFAAQ FVAF FNWTN I GQY I AVKGAVFLKKFRLGGS VLF IGF I L I CAF INLM 412 
I GS AS AQWAVTAP I FVPMLMLAGYAPEVIQAAYR I GDS VTN I ITPMMS YFGL I MATVXXY 2 70 

llllllllllllllllll MM MMMM MIMMIIMM IMMIIII I 

IGSASAQWAVTAP I FVPMLMLAGNAPQVIQAAYRIGDSVTNI ITPMMS YFGLIMATVIKY 4 72 
KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

1 1 1 1 1 M 1 1 M I M 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 M II 1 1 1 M 

KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 
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The complete length ORF12ng nucleotide sequence (SEQ ID NO: 139) is: 



1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATG AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

4 51 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 
651 ' CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

701 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG " CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

1001 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

12 01 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

12 51 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

13 01 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 
1351 GTTACCAATA TTATTAGGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 
1451 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 
1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG .GCACACCCAC 
1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 140): 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 

51 VPDPRPVGAK GRADDG LIHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASE LGY 

151 WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANWF FMAASTFVIA LIGYFV TEKI VEPQLGPYQS 

2 51 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 
301 PETGLVAGSP FLKS IWFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

3 51 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGSVLFI 
401 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 
451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 
501 CIWVFVL GLP VGPGTPTFYP VP* 

ORF12ng (SEQ ID NO: 140) shows 97.1% identity in 522 aa overlap with ORF12-1 (SEQ ID NO: 
136): 

10 20 30 40 50 60 

MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI I FI VLLLI ASAVGAYFGLSVPDPRPVGAK 

Ilili-MMMIIIII IMIIIMIMI IIIIIMIIIII IIIIIMIIIIIII 

MSQTDARRSGRFLRTVEWLGlSnVILPHPVTLFI I FI VLLLI ASAVGAYFGLSVPDPRPVGAK 



orf 12-1 .pep 
orf 12ng 
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10 



20 



30 



40 



50 



60 



70 80 90 ' 100 110 120 

orf 12 - 1 . pep GRADDGLI YIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I I I I I- I I M :| I h I I I I I M I I I I I I I M I I I I I I I I ' MM: I I i I M I 
orf 12ng GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf 12 - 1 . pep LLLTKSPRKLTTFMVVFTGILSNTASELGYVVLIPLSAI I FHSLGRHPLAGLAAAFAGVS 

II I I I I I I II I II I I I I I I I I I I I I I I I I I I I I III I hi I I I I i I I : M II h I I I I I I 

orf 12ng LLLTKSPRKLTTFMWFTGILSNTASELGYVVLIPLSAVI FHSLGRHPLAGLAAAFAGVS 

130 140 150 160 170 180 



15 



190 200 210 220 230 240 

orf 12-1 .pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

IIIIIIIIMIIIIIIMIIMMIIIMMIIIIIIIII Ml I'l I l.nill 

orf 12ng GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKI 

190 200 210 220 230 240 



20 



250 260 270 280 290 300 

orf 12-1 .pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1' I 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 

orf 12ng VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 



25 



310 320 330 340 350 360 

orf 12 - 1 . pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 

I I I I M: I II I I I I I I I I I M I I I I I I I I . I hi I I I , I : I I I I I I I I M II I I I I I I I 
orf 12ng PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 



30 



370 380 390 400 ■ 410 420 

orf 12-1 .pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

IIIIIIIIIIIIIIMIIIIMIhl-lil III M I IIIIIIIIIMIIIIIII 

orf 12ng IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 



35 



430 440 450 460 470 480 

orf 12 - 1 . pep AVTAP I FVPMLMLAGYAPEVI QAAYR I GDS VTNI I T PMMS Y FG L I MAT V I K Y KKDAG VGT 

IMIIIIIIIIIIIIIIIIMIII I lllllil I I lllllllllllllllllll 

orf 12ng AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 



40 



490 500 510 520 

orf 12-1. pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

Mi Mil MINIMI M III H Ml I- Nihil 

orf 12ng LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVPX 

490 500 510 520 



In addition, ORF12ng (SEQ ID NO: 140) shows significant homology with a hypotehtical protein 
(SEQ ID NO: 1 124) from Kcoli: 



45 



sp | P46133 | YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
)gi | 1787597 (AE000231) hypothetical protein in ogt 5' region [Escherichia coli] 
Length = 510 
Score = 329 bits (835), Expect = 2e-89 
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Identities = 178/507 (35%), Positives - 281/507 (55%), Gaps = 15/507 (2%) 

Query: 8 RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 

+SG+ VE +GN +PHP +A+ + FG+S +P D 
Sbjct: 13 QSGKLYGWVERIGNKVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 

5 Query: 68 IHWSLLDADGLIKILTHTVKNFTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 

+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

Sbjct: 65 WVKNLLSVEGLHWFLPNVIKNFSGFAPLGAILAL^ 124 

Query: 128 RKLTTFMVVFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVSGGYSANL 187 
+ ++MV+F S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 
10 Sbjct: 125 ARYASYMVLFIAFFSHISSDAALVIMPPMGALIFLAVGRHPVAGLLAAIAGVGCGFTANL 184 

Query: 188 FLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFI^IAASTFVIALIGYFVTEKIVEPQLGP 24 7 

+ T D LL+GI+ +AA +P V NW+FMA+S V+ ++G +T+KI+EP+LG 
Sbjct: 185 LIVTTDVLLSGISTEAAAAFNPQMHVSVIDNWYFMASSVWLTIVGGLITDKIIEPRLGQ 244 

Query: 248 YQSDLSQEEKDIRHSNEI TPLEYKGL I WAGVVFVALSALLAWS I VPADGI LRHPETGLVA 307 
15 +Q + ++ + + S GL AGW + A +A ++P +GILR P V 

Sbjct: 245 WQGNSDEKLQTLTESQRF GLRIAGWSLLFIAAIALMVIPQNGILRDPINHTVM 298 

Query: 308 GSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLXXXXXXXXX 367 

SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 
Sbjct: 299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 

20 t Query: 368 XXXXNWTN I GQY I AVKGAVFLKE VGLGGS VLF I GF I L I CAF INLM IGS AS AQWAVTAP I F 427 

NW+N+G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 

Sbjct: 359 VAMFNWSNMGKFIAVGLTDILESSGLSGIPAFVGLALLSSFLCMFIASGSAIWSILAPIF 418 

Query: 428 VPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 
VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S++LP 
25 Sbjct: 419 VPMFMLLGFHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 478 

Query: 4 88 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 479 YPLIFLWWLLMLIoAW- YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
30 predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 17 

The following partial DN A sequence was identified in N. meningitidis (SEQ ED NO: 141): 

35 1 . . ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGTGGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

*151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

201 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 
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251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

4 01 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAATCAG TTTgTGCGGC 

5 4 51 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT.. 

This corresponds to the amino acid sequence (SEQ ID NO: 142; ORF14); 

1 . . TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

10 51 MASRSASIPV FSATEMRTAA IFPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG. . 

Computer analysis of this amino acid sequence gave the following results: 

15 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF14 (SEQ ID NO: 142) shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) 
(SEQ ID NO: 144) from strain A of N. meningitidis: 

10 20 30 

orf 14 . pep TAGAAGXXVFVFVTDSQVEVFGN I QTAVET 

20 ' I : I I I I I I I II I I : I - I I II : I I I I I 

orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 

150 160 170 180 , 190 200 

40 50 60 70 80 90 

orf 14 . pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 

25 I 1 1 I II I 1 1 I 1 1 I I I I I I 1 1 1 1 1 1 I 1 1 I I 1 1 1 1 I 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 I I II 1 1 1 I 

orf 14a GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
210 220 230 240 250 260 

100 110 120 130 140 150 

orf 14 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

30 I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 14a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 

160 

orf 14 .pep RXLTNPTVSVRIMLHSG 

35 I II II 1 1 Mill MM 

orf 14a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF14a nucleotide sequence (SEQ ID NO: 143) is: 

40 1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC GAACATCATC GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

45 251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 
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3 51 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

4 01 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 
451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 
501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 
551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 
601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 
651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 
701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 
751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 
801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 
851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 
901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 
951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGTATC GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA ACAGACTGGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 144): 



1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR . 

351 RAWSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

It should be noted that this sequence includes a stop codon at position 1 1 8. 
Homology with a predicted ORF from N. gonorrhoeae 

ORF14 (SEQ ID NO: 142) shows 89.8% identity over a 167aa overlap with, a predicted ORF 
(ORF14.ng) (SEQ ID NO: 146) from N. gonorrhoeae: 

or f 14 . pep TAGAAGXXVFVFVTDSQVEVFGN I QTAVET 3 0 

II III l|:||:|:|::|||hl MM 
orf 14ng GRQFGFFRVGGASFVITAQAGIDDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 2 08 - 

orf 14-. pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

IIIMMI lllllllll Mil Mil III Mil III IMIIMIIM MMMIIIIIII 

or f 14ng GFFHG I S VS S VFGAAAQYS AMASRSAS I PVFSATEMRTAAI FPAASRHMPVFCSSDGSRS 268 

orf 14 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

Ml II I II I II MUM Ml IIIMMI II I II II II II Mil hi III I III I II 

orf 14ng VLLYTLMHGI SWAWI SCSTFSTSS ICCPLFRAAASTTCSSTS ACTVSSKVAEKAE I SLCG 328 

orf 14 . pep RXLTNPTVSVRIMLHSG 167 

I IMIMMIMIhl 

orf 14ng RSLTNPTVSVRIMLHAGLMYSRRAWSRVAKSWSFAYMPDLVSRLNRLDLPTLV 3 82 

The complete length ORF14ng nucleotide sequence (SEQ ID NO: 145) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 146): 
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1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

2 51 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 
301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

3 51 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 18 

The following partial DNA sequence was identified in N, meningitidis (SEQ ED NO: 147): 

1 . . GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 

51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 

101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 

151 TCGTTCGGCG CGCTGATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 

2 01 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 
251 AAA . NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 
301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 

3 51 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 
401 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 
451 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 
501 GAATCAGGAA AAAGCCAACT GGATCGCACT CTTAAAA.CC GCGC. 

This corresponds to the amino acid sequence (SEQ ID NO: 148; ORF16): 



1 . . GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 SFGALMIALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTWVA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A.. 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 149): 



1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

401 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 
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10 



751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



CTCTTGAAAA 
CTTCTGCTGG 
TTGCGGAAAA 
GAGGCGGGTA 
GGTGATTTGT 
CGGGTTATTT 
TTCTTCATCG 
CATCGCTTGG 
CCTTGTCGGG 
ATCTGTATGC 
TATGCTGGGC 
TGCTGCTGGG 
GTTTGA 



CCGCGCCTAA 
TTCGCCTTCC 
CGTCTGGCAC 
ACTGGTACGG 
TCGTTTGTAT 
CGGCTGTTTG 
GCAACCAATA 
GCGGGCATTA 
CAAGCATATG 
CTCAAATCGT 
GGCTTGCAGG 
CGCGTTTTCC 



GGCGTTTTGG 
AATATATGTG 
ACCACCGATG 
CGTTTTGGCG 
TGGCGAAAGT 
GCTTTGGGCG 
CGCGCTGGTG 
TCACTTATCC 
GGCACTTACT 
CGCTTCGCTG 
CCACTATGTT 
GTGTTCCTGA 



ACGGTTACTT 
GACTTACTCG 
CGTCTTCCGT 
GCGGTGCAGT 
GCCGAATAAA 
CGCTCGGCTT 
TTGTCTTATA 
GCTGACGATT 
TGGGCTTGTT 
TTGAGTTTCG 
CTTGGTAGGG 
TTAAAGAAAC 



TGGTGCAATT 
GCAGGCGCGA 
AGGTTATCAG 
CGGTTGCGGC 
TACCATAAGG 
TTTCTCCGTT 
CCTTAATCGG 
GTGACCAACG 
TAACGGCTCT 
TGCTTTTCCC 
GGCGTCGTCC 
ACACGGCGGG 



15 This corresponds to the amino acid sequence (SEQ ID NO: 150; ORF16-1): 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS 
ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR 
AVIVMILMPN SGSFGFGYAS LAALSFGALM IALLDVSSNM 



DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN 
VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA 
LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH 
EAGNWYGVLA AVQSVAAVIC SFVLAKVPNK YHKAGYFGCL 
FFI GNQY ALV LSYTLIGIAW AGI I TYPLTI VTNALSGKHM 
ICMPQIVASL LSFVLFPMLG GLQATMFLVG GWLLLGAFS 
V* 



QMSRIFQTLG 
LPYLLYGTLI 
AMQPFKMMVG 
TAEKGWPQT 
ANQEKANWIE 
TTDASSVGYQ 
ALGALGFFSV 
GTYLGLFNGS 
VFLIKETHGG 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



30 



ORF16 (SEQ ID NO: 148) shows 96.7% identity over a 181 aa overlap with an ORF (ORF16a) 
(SEQ ID NO: 152) from strain A of AT. meningitidis: 



35 



10 20 30 

orf 16 . pep GH YS DRTWKPRLXGRR L P YLL YGTL I AV I V 

IMIIMIIIII lllllllllllllllll 
orf 16a IFQTLGADPHSLGW FFILPPLAGMLVQPIVG HYSDRTWKPRLGGR RLPYLLYGTLIAVIV 

50 60 70 80 90 100 



40 



40 50 60 70 80 90 

orf 16 . pep M I LM PNSGSFGFGY AS LAALSFGALM I ALLDV SSNMAMQPFKMMVGDMVNEEQKXYAYG I 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mill 

orf 16a MIL MPNSGSFGFGY ASLAALSFGALMIALLDV SSNMAMQPFKMMVGDMVNEEQKGYAYGI 
110 120 130 140 150 160 
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100 110 120 130 140 150 

orf 16 .pep QS FLANTG AWAA I LP FVFA Y I GLAN TAXKGVVPQ TVVVAFYVGAALLVI TS A FT I FKVK 

I I I I I I I I I I I I I M II I I I II I I I I I I I I I I I I M I I I; I I I I I I I I I I I I I 
orfl6a QS FLANTG AWAA I L P FVFA Y I GLAN T AE KGWPQT WVAFYVGAALLVITSA FT I FKVK 

170 180 190 200 210 220 



160 170 180 

orf 16 .pep E YXPETYARYHG I D VAANQEKANW I ALLKXA 

II I 111 1 . 1 1 1 ! 1 1 1 1 1 ! 1 1 1 1 1 MM 
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orf 16a EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 

or f 1 6 a AENVWHTTDAS S VGYQEAGNWYG VLAAVQS VAAV I CS FVLA KVPNKYHKAGY FGCLALGA 

290 300 310 320 330 340 

5 

The complete length ORF16a nucleotide sequence (SEQ ID NO: 151) is: 

1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCTCGGT 

10 151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

3 01 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

15 4 01 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

451 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 
501. CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

20 651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

701 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

25 901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

30 1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

1201 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

13 51 GTTTGA 

35 

This encodes a protein having amino acid sequence (SEQ ID NO: 152): 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMILMPN SGSFGFGYA S LAALSFGALM IALLDV SSNM AMQPFKMMVG 

40 151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN TAEKGWPQT 

201 VWAFYVGAA LLVITSA FTI FKVKEYNPET YARYHGIDVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYGVLA AVQSVAAVIC SFVLAKVPNK YHKAGYFGCL ALGALGFFSV 

3 51 FFI GNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

45 401 ICMPQIVASL LSFVLFPMLG GLQATMFLVG GWLLLGAFS VFLIKETHGG 

451 V* 

ORF16a (SEQ ID NO: 152) and ORF16-1 (SEQ ID NO: 150) show 99.6% identity in 451 aa 
overlap: 

50 10 20 30 40 50 60 

orf 16a . pep MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 

1 1 1 1 1 1 1 1 M I i 1 1 1 1 1 II 1 1 1 1 1 1 i 1 1 M M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 
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or f 1 6 - 1 MSE YTPQTAKQGLPALAKST I WMLS FGFLGVQTAFTLQS SQMSRI FQTLGADPHNLGWFF 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 16a. pep ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

5 ! 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 i i 1 1 1 M I M 1 1 II li 1 1' 1 1 II 1 1 1 1 1 1 1 M 1 1 1 ! I 

orf 16 - 1 ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 16a . pep LAALSFGALMIALLDVSSNMAMQPFKMW^ 

10 IMMIIIIIMIIIIIMIIIII llllll MIMIIIIMM IMIIIIMII 

orf 16-1 LAALSFGALMIALLDVSSNMAMQPFKMMVGDMWEEQKGYAYGIQSFIJVNTGAVVAAILP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 16a . pep FVFAY I GLANTAEKGWPQTVWAFYVGAALLVI TS AFT I FKVKEYNPETYARYHG ID VA 

15 ' I II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 1 1 M I M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 :< 1 1 M 1 1 1 II 1 1 1 

orf 16 - 1 FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf46a . pep ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

20 1 1 Mill MM III 1 1 llllll II Ml III I III II MM I III III III 1 1 III I MM 

orf 16-1 ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 16a . pep EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

25 | 1 1 || || || || || 1 1| II II II II II II II II II II II II I II I I I I II I II I II II I I I 

or f 16 - 1 EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 ^ 350 360 

370 380 390 400 410 420 

orf 16a. pep LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

30 | || | || || | | || | || | | | | | || | | | || II II I II I I II I II I I I II I I I II II I I I I II I 

orf 16-1 LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

370 380 390 400 410 420 

430 440 450 

orf 16a .pep GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

35 | | || | | || | || || I I II II II I II II I I II II 

orf 16- 1 GLQATMFLVGGWLLLGAFSVFLI KETHGGVX 

430 440 450 

Homology with a predicted ORF from N. gonorrhoeae 

ORF16 (SEQ ID NO: 148) shows 93.9% identity over a 181aa overlap with a predicted ORF 
40 (ORF1 6.ng) (SEQ ID NO: 1 54) from N. gonorrhoeae: 

orf 16. pep GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 30 

hlMMIMM IIIIMIIIIIIIIIII 
orf 16ng HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 

orf 16 .pep MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAM^ 90 

45 1 1 1| 1 1 1 1 1 1 1| || 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 II II II 1 1 1 1 1 II II II 1 1 M I 
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orfl6ng MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 191 

orfl6.pep QS FLANTGAWAA I LPFVFAY IGLANTAXKGWPQTVWAFYVGAALLVI TSAFT I FKVK 150 

1 1 1 1 1 1 1 II IIIIIMMIMII III II MM I MM II III I IM III I II III 

orf 16ng QSFLANTDAVVAAILPFVFAYIGLANTAEKGVVPQTVVVAFYVGAALLIITSAFTISKVK 251 

5 orf 16. pep EYXPETYARYHGIDVAANQEKANWIALLKXA 181 

II II llllllll Mllllllh llhl 

orf 16ng EYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWTVTPVQFFCWFAFRYMWTYSAGAI 311 

The complete length ORF16ng nucleotide sequence (SEQ ID NO: 153) is: 

10 1 ATGATAGGGG ATCGCCGCGC CGGCAACCAT TTCGGATTTT C CAAAGC AAA 

51 TACTTTTCAA ATCAAAAAAA AGGATTTACT TTATGTCGGA ATATACGCCT 

101 CAAACAGCAA AACAAGGTTT GCCCGCGCCG GCAAAAAGCA CGATTTGGAT 

151 GTTGAGCTTC GGCTATCTCG GCGTTCAGAC GGCCTTTACC CTGCAAAGCT 

201 CGCAGATGAG CCGCATTTTT CAAACGCTAG GCGCAGACCC GCACAATTTG 

15 251. GGCTGGTTTT TCATCCTGCC GCCGCTGGCG GGGATGCTGG TTCAGCCGAT , 

301 AGTGGCTACT ACTCAGACCG CACTTGGAAG CCGCGCTTGG GCGGCCGCCG 

351 CCTGCCGTAT CTGCTTTACG GCACGCTGAT TGCGGTCATC GTGATGATTT 

4 01 TGATGCCGAA CTCGGGCAGC TTCGGTTTCG GCTATGCGTC GCTGGCGGCC 

451 TTGTCGTTCG GCGCGCTGAT GATTGCGCTG TTGGACGTGT CGTCGAATAT 

20 501 GGCGATGCAG CCGTTTAAGA TGATGGTCGG CGATATGGTC AACGAGGAGC 

551 AGAAAAGCTA CGCCTACGGG ATTCAAAGTT TCTTAGCGAA TACGGACGCG 

601 GTTGTGGCAG CGATTCTGCC GTTTGTGTTC GCGTATATCG GTTTGGCGAA 

651 CACTGCCGAG AAAGGCGTTG TGCCACAAAC CGTGGTCGTA GCATTCTATG 

701 TGGGTGCGGC GTTACTGATT ATTACCAGTG CGTTCACAAT CTCCAAAGTC 

25 751 AAAGAATACG ACCCGGAAAC CTACGCCCGT TACCACGGCA TCGATGTCGC 

801 CGCGAATCAG GAAAAAGCCA ACTGGTTCGA ACTCTTAAAA ACCGCGCCTA 

851 AAGTGTTTTG GACGGTTACT CCGGTACAGT TTTTCTGCTG GTTCGCCTTC 

901 CGGTATATGT GGACTTACTC GGCAGGCGCG ATTGCAGAAA ACGTCTGGCA 

951 CACTACCGAT GCGTCTTCCG TAGGCCATCA GGAGGCGGGC AACCGGTACG 

30 1001 GCGTTTTGGC GGCGGTGTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 154): 

1 MIGDRRAGNH FGFSKANTFQ I KKKDLLYVG IYASNSKTRF ARAGKKHDLD 
51 VELRLSRRSD GLYPAKLADE . PHFSNARRRP AQFGLVFHPA AAGGDAGSAD 
35 101 SGYYSDRTWK PRLGGRR LPY LLYGTLIAVI VMIL MPNSGS FGFGY ASLAA 

151 LSFGALMIAL LDV SSNMAMQ PFKMMVGDMV NEEQKSYAYG IQSFLANTDA 
201 WAAILPFVF AYIGLANTAE KGWPQTVW AFYVGAALLI ITSAFTISKV 
251 KEYDPETYAR YHGIDVAANQ EKANWFELLK TAPKVFWTVT PVQFFCWFAF 
3 01 RYMWTYSAGA IAENVWHTTD ASSVGHQEAG NRYGVLAAV* 

40 

ORF16ng (SEQ ID NO: 154) and ORF16-1 (SEQ ID NO: 150) show 89.3% identity in 261 aa 
overlap: 

30 40 50 60 70 80 

orf 16 - 1 . pep MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI -VGHYSDRT 

45 I ::| I I II : hi I I II 

orf 16ng DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 

50 60 70 80 90 100 

90 100 110 120 130 140 

orf 16-1. pep WKPRLGGRRLPYLLYGTLI AVIVMILMPNSGSFGFGYASLAALSFGALMiALLDVSSNMA 
50 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | I 
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orf 16ng WKPRLGGRRLP YLLYGTL I AVI VM I LMPNSGS FGFGYASLAALS FGALM I ALLDVSSNMA 

110 120 130 140 150 160 

150 160 170 180 190 200 

orf 16-1 .pep MQP FKMMVGDMVNEEQKGYAYG I QS FLANTGAWAAI LPFVFAY I GLANTAEKGWPQTV 

llllllllllllllllhllMIIIIIIII 1 1 1 1 M 1 1 1 1 1 1 1 II i 1 1 1 1 1 1 M 1 1 1 

orf 16ng MQPFKMMVGDNTVNEEQKSYAYGIQSFLANTDAWAAILPFVFAYIGLANTAEKGW 
170 180 190 200 210 220 

210 220 230 240 250 260 

orf 16 - 1 . pep WAFYVGAALLVI TS AFT I FKVKE YDPET YARYHG I DVAANQEKANW I ELLKTAPKAFWT 

llllllllllhlllMII UIIMIIIMIIIIII MM IIMIIIIIIMII 

orf 16ng WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 
230 240 250 260* 270 280 

270 280 290 300 310 320 

orf 16-1 .pep VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 

II 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 U 1 1 1 1 lllllll 

orf 16ng VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
290 300 310 320 330 340 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 19 



The following partial DNA sequence was identified in N .meningitidis (SEQ ID NO: 155): 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

151. GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C.TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 



This corresponds to the amino acid sequence (SEQ ID NO: 156; ORF28): 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 
51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 
101 PSYXCHQALP VKLGSXGSQN. . . 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 157): 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

2 01 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

2 51 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 
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301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

4 01 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

4 51 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

5 501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

701 ATGCCGCCCG CAAATGA 

10 

This corresponds to the amino acid sequence (SEQ ID NO: 158; ORF28-1): 

1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

15 151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 

201 KLFANILYTP PF LILDAAGA VLALPAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

20 ORF28 (SEQ ID NO: 156) shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) 
(SEQ ID NO: 160) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 28 . pep MLFRKTTAAVLAHTLMLNG CTLMLWGMNNP VSET I TRKHVXKDQ I RXFGWAEDNAQLEK 

Illlllllllll IIIIIMhhIllhl III Mill Mill I I I ! I i I I - I I I I 
25 orf28a MLFRKTTAAVLAATLMLNG CTVMMWGMNS PFSETTARKHVDKDQ I RAFG WAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 2 8 . pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQI VXDTPSYXCHQALPVKLGSXGSQN 

1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 MM Mill Ihl : I : M 1 1 II II I MM 

30 orf 2 8a GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA- YQALPVKLESPASQN 

70 80 90 100 110 

orf 2 8a FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
120 130 140 150 160 170 

The complete length ORF28a nucleotide sequence (SEQ ID NO: 159) is: 



35 1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

40 251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT ACCGACAGAC 

4 01 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

4 51 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

45 501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 
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651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 
701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 160): 

5 1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 
101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 
151 DNRT I YTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 
201 LFENIAYTPT TL ILDAVGAV LALPVAALIA ATNSSDK* 

10 

ORF28a (SEQ ID NO: 160) and ORF28-1 (SEQ ID NO: 158) show 86.1% identity in 238 aa 
overlap: 

10 20 30 40 50 60 

or f 2 8a . pep MLFRKTTAAVLAATLMLNGCTVMMWGMNS P FS ETTARKHVDKDQ I RAFGWAEDNAQLEK 

15 II II II II 1 1 II MM Ml Ihl : II Ihl Ml MINIMI I lllllllllllll 1 1 

orf 28 - 1 MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSET I TRKHVDKDQ I RAFGWAEDNAQLEK 

10 20 30 40 50 60 . 

70 80 90 100 110 119 

orf 2 8a . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKQFQMVEPNPRFA- YQALPVKLESPASQN 

20 || Ml || 1 1| 1 1| 1 1 1| II MM I II II II II Ihl I :| M Mill MM Ihl 1 1 

orf 2 8-1 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 28a . pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 

25 | | | | | | | | || || h II I I I I I I I I II I h II I I I II I I I I I II II I I I I I I II II I I II 

orf 2 8 - 1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTI YTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 2 8a. pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 

30 llllllllll I I : : I I M M I I || Ml llllhllllllhlll h : : : : I I 

orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 

190 200 210 220 230 

Homology with a predicted ORE from N. gonorrhoeae 

ORF28 (SEQ ID NO: 156) shows 84.2% identity over a 120aa overlap with a predicted ORF 
35 (ORF28.ng) (SEQ ID NO: 162) from N. gonorrhoeae: 

orf 28 . pep MLFRKTTAAVIJ^TLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 60 

1 1 1 1 1 1 1 1 1 1 1 1 Ihllllhll llllllhlllllll Mill lllllllllllll 

O r f 2 8 ng ML FRKTTAAVLAATL I LNGCTMMLRGMNNPVSQT I TRKHVDKDQ I RAFGWAEDNAQLEK 6 0 

orf 28 . pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 120 

40 I I I I I I I I I I : hi I I I I I I Ihl III I I MM I Mill IIIIMh : I II I 

or f 2 8ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 12 0 



The complete length ORF28ng nucleotide sequence (SEQ ID NO: 161) is 
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1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

51 GAACGGCTGT ACGATGATGT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

3 01 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

3 51 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

4 01 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 
4 51 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 
501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 
551 TGCCCGCCGA TATTTATTAT ACGGTTACTG AAAAACATAC CGACAAATCC 
601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 
651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 
701 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 162): 

1 MLFRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALIA AANSSDK* 

ORF28ng (SEQ ID NO: 162) and ORF28-1 (SEQ ID NO: 158) share 90.0% identity in 231 aa 
overlap: 

10 20 30 40 50 60 

orf 2 8 - 1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSET I TRKHVDKDQ I RAFGWAEDNAQLEK 

MINI MIIIMMIIIMI II I IMIIIIIIIIIIIIIIlUIIIIIMI 

orf28ng MLFRKTTAAVLAATL I LNGCTMMLRGMNNPVSQT I TRKHVDKDQ I RAFGWAEDNAQLEK 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 2 8 - 1 . pep GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

I I I I I I I I M : I I M I I I I I MM h I ll I I I I I I I I M i I I I I I I I I . hh I I I I 
orf2 8ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 28 - 1 . pep FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

Ml I II II I I I I II II II I : : I I M I ! I I 1 I I 1 I j I i I I I i I 1 II I I 1 I I 
orf 28ng FSTGGLCLRYDTGRPDDIAKLKQLEFKAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

orf 2 8 - 1 . pep EQS VP AD I YYTVTEEHTDKS KLFAN I LYTP PFL I LDAAGAVLALPAAALGAWDAARKX 

lllllll MM IIMIIMIIMIIIIIMMIIMIhll I -M 

orf 28ng . EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 

190 200 210 220 230 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF28-1 (SEQ ID NO: 158) (24kDa) was cloned in pET and pGex vectors and expressed in 
E.coli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 6A shows the results of affinity purification of the GST-fusion protein, and 
Figure 6B shows the results of expression of the His-fusion in Exoli. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for ELISA, which gave a positive result. These 
experiments confirm that ORF28-1 (SEQ ID NO: 158) is a surface-exposed protein, and that it may 
be a useful immunogen. 

Example 20 

The following partial DNA sequence was identified in N Meningitidis (SEQ ID NO: 163): 



1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

101 CGTTCGATCA TCATGATTCA AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

3 01 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

351 AGAAAATGCC GGTGCCGCCT CTGGT . . 



This corresponds to the amino acid sequence (SEQ ID NO: 1 64; ORF29): 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 165): 



1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 

201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AGGTGTTATC GGTTATGAAA CCCATTTTTC AGGGCACGGA 

3 01 CATGAAGTAC ACAGTCCGTT CGATCATCAT GATTCAAAAA GCACTTCTGA 
351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 
4 51 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 
501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 
551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 
601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 
651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 
701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 
751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 
801 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 
851 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 
901 GCCAAACAAT GGGCTGATGC CCATCCAAAT ATAACAGCTA CTGCCCAAAC 
951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 



51 
101 



1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
1 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 
1 TKTSIVPQAP FSDRWLEENA GAASG. . 
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1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CGGTTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT TACCAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 

5 1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAATTACA ATTATAAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

1401 AGCAAAAGAT TATTTACAAC AACAAACTCA TATCAGGAAC TTAGACAAAT 

10 1451 GA 

This corresponds to the amino acid sequence (SEQ ID NO: 166; ORF29-1): 

1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQTFDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

15 101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

201 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

20 351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

451 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 

25 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF29 (SEQ ID NO: 164) shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) 
(SEQ ID NO: 1 68) from strain A of N. meningitidis: 

10 20 30 

orf 29 .pep ' VSPVLPITHERTGFEGVIGYETHFSGHGHE 

30 hhlllllllMllhllllMIIIIIII 

or f 2 9a EPGGKYHLFGNARGS VKNRVYAVQTFDATAVGP I LP I THERTGFEG I IGYETHFSGHGHE 

50 60 70 80 90 100 

40 50 60 70 80 90 

orf 2 9. pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 

35 II lllhll II I II II II lllllll lllllll lllllll MM- IIIIIIMIM 

orf 2 9a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 

100 110 120 

orf 29 . pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 

40 1 1 1 1 1 1 1 1 1 h M I h 1 1 1 1 II I h 1 1 II I II I 

orf 2 9a XXYVKGTSTKTKSNIVPRAPFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANR 
170 180 190 200 210 220 



45 



orf 2 9a MDD I RGI VQGAVNPFLMGFQGVG I GA I TDS AVS PVTDTAAQQTLQGXNHLGXLS PEAQLA 

230 240 250 260 270 280 

The complete length ORF29a nucleotide sequence (SEQ ID NO: 167) is: 
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1 ATGAATTNGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

5 2 01 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

2 51 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACATGGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTTCATCGGA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

10 4 51 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

6 01 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

15 701 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 

20 951 TGCCCTTGCC GTAGCAGANG CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

1151 GACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

25 12 01 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAGCAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

1301 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 

1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

14 01 TACAGCATTT AGACCAACAT CAGGTAAAAA ATATTATGAT GATTTATAG 



30 



This encodes a protein having amino acid sequence (SEQ ID NO: 168): 



1 MNXPIQKFMM LFAAAISXLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGII GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

35 151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

2 01 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 
251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 
301 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 

3 51 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 
40 401 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 

4 51 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a (SEQ ID NO: 168) and ORF29-1 (SEQ ID NO: 166) show 90.1% identity in 385 aa 
overlap: 



45 10 20 30 40 50 60 

orf 2 9a . pep MNXPIQKFIVIMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 

II llllllllllllll M I ■ I M I 1 1 Ml 1 1 1 I M I ; 1 1 1 1 ; 1 1 1 IM I 1 1 i 1 1 1 1 i: 

orf 2 9 - 1 MNLP I QKFMMLFAAA I SLLQ I P I SHANGLDARLRDDMQAKHYEPGGKYHLFGNARGS VKK 

10 20 30 40 50 60 

50 70 80 90 100 110 120 

orf 2 9a. pep RVYAVQTFDATAVGPILPITHERTGFEGI IGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

I I I I I I I I I I I i :| :! I I I I M I M I I : I I i I I I I I i I I I I I I I h I I I I I I M I I 
orf 2 9 - 1 RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 29a . pep GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTSTKTKSNIVPR 

Mill IMMIIIIIII MM MM MMMMIMM MMMMIIMMM 

orf 29-1 GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 



10 



190 200 210 220 230 240 

orf 29a . pep APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDIRGIVQGAVNPFLMG 

1 1 1 ! ! 1 1 1 1 1 1 1 1 1 1 i 1 1 i 1 1 1 1 < 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 : 1 1 1 1 1 1 1 j 

or f 2 9 - 1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGI VQGAVNPFLMG 

190 200 210 220 230 240 



15 



250 260 270 280 290 300 

orf 2 9a . pep FQGVGIGAITDSAVS PVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDSAFAVKDGINS 

1 1 1 1 1 M 1 1 II ' 1 1 1 1 III I I II IIMMIMM IMIMMIIIII 

orf29-l FQGVG I GAI TDS AVS PVTDTAAQQTLQG INDLGKLS PEAQLAAASLLQDS AFAVKDG INS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 2 9a . pep ARQWADAHPNITATAQTALAVAXAATTVWGGKKVELNPTKWDWVKNTGYXTPAVRTMHTL 

MMMI MIIMIMI MM II III MIMIIIIIIIIMII II Ihl I : I I 

orf 2 9-1 AKQWADAHPNITATAQTALSAAEAAGTWRGKCTELNPTKWDWVKNTGYKKPAARHMQTL 
20 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 29a. pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

Mill MM M || h I 
orf 29-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 
25 370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF29 (SEQ ID NO: 164) shows 88.8% identity over a 125aa overlap with a predicted ORF 
(ORF29.ng) (SEQ ID NO: 170) from N. gonorrhoeae: 



orf 29 .pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 3 0 

30 | : | : || | | | | | | | || | | | | | | | | | | | | | | I 

orf29ng EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

orf 29 . pep VHS P FDHHDS KSTSDFSGGVDGGFTVYQLHRTWS E IHPEDEYDGPQAAXYP PPGGARD I Y 90 

Ml llhlll Ml III llllllllllllllll MMIII 11111-= 1 1 1 1 1 1 1 1 1 1 1 

orf29ng VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY * 162 

35 orf 29. pep SYYVKGTSTKTKTSIVPQAPFSDRWLEENAGAASG 125 

M : : I ! I : I I I I I I II I M M I I I I I I I 

orf 2 9ng SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 222 

The complete length ORF29ng nucleotide sequence (SEQ ID NO: 169) is predicted to encode a 
40 protein having amino acid sequence (SEQ ID NO: 170): 



1 MNLPIQKFMM LFAAAISLLQ IPISHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGG 

151 GYPPPGGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 
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201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGLGVGAIT 

251 DSAVSPVTYA AARKTLQGIH NLGNLSPEAQ LAAATALQDS AFAVKDSINS 

301 ARQWADAHPN ITATAQTALA VTEAATTVWG GKKVELNPAK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNKP LESKNTVTTN NFFENTGYTE KVLRQASNGD 

4 01 YHGFPQSVDA FSENGTVIQI VGGDNIVRHK LYIPGSYKGK DGNFEYIREA 

4 51 DGKINHRLFV PNQQLPEK* 

In a second experiment, the following DNA sequence (SEQ ID NO: 171) was identified: 



1 atgAATTTGC CTATTCAAAA ATTCATGATG ctgttggcAg cggcaatatc 

10 51 gatgctGCat ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGCAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTGCG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGTGTTATC GGCTATGAAA CCCATTTTTC AGGACACGGA 

15 301 CACGAAGTAC ACAGTCCGTT CGATAATCAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GCGGTTTTAC CGTTTACCAA CTTCATCGGA 

4 01 CAGGGTCGGA AATACATCCC GCAGACGGAT ATGACGGGCC TCAAGGCGGC 

4 51 GGTTATCCGG AACCACAAGG GGCAAGGGAT ATATACAGCT ACCATATCAA 

501 AGGAACTTCA ACCAAAACAA AGATAAACAC TGTTCCGCAA GCCCCTTTTT 

20 551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCTTCCGG TTTTCTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAACGACC CCGATAAAAA 

651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 

751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AGACTCTACA 

25 801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACAGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 

30 1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 

1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 

12 51 TTTTCCAATA GGAACTGCAA CTTATGAAGA GGCAGATAGA CTAGGTAAAA 
35 1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 

13 51 AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AATCACAATT 

14 01 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 
14 51 ATGAAAAAAG AAATAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

40 This encodes a protein having amino acid sequence (SEQ ID NO: 172; ORF29ng-l): 



1 MNLPIQKFMM LLAAAISMLH I P I SHAN GLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

45 201 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

3 01 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 
351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

4 01 AAQDPRLSLA IHEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 
50 4 51 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 

ORF29ng-l (SEQ ID NO: 172) and ORF29-1 (SEQ ID NO: 166) show 86.0% identity in 401 aa 
overlap: 
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orf 29ng-l .pep 



orf29-l 



10 20 30 40 50 60 

MNLP I QKFMMLLAAAI SMLH I P I SHANGLDARLRDDMQAKHYEPGGKYHLFGNARGS VKN 

1 1 1 II 1 1 1 1 hi 1 1 1 h h I II 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M h 

MNLPIQKFMMLFAAAISLLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKK 
10 20 30 40 50 60 



10 



orf 29ng-l .pep 



orf29-l 



70 80 90 100 110 120 

RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

II lllllllllhhlll MIIIIIIIMIIIIIIIMMMIII IMMI lllllll 

RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 
70 80 90 100 110 120 
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orf 29ng-l .pep 



orf29-l 



130 140 150 160 170 180 

GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 

MMIIIIIIIIIIIIIIII lllllllh II I MMIMMMIM III I II 

GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 
130 140 150 160 170 180 
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orf 29ng-l .pep 



orf29-l 



190 200 210 220 230 240 

APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ^ E 1 1 1 1 1 1 1 1 1 1 E = 1 1 = t E I MIIIIMIII lllllll I 

APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 
190 200 210 220 230 240 
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orf 29ng-l .pep 



orf 29-1 



250 260 270 280 290 300 

FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 

1 1 1 1 1 1 1 ii 1 1 1 1 1 1 ii 1 1 1 1 1 ' i i ■ 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 n i 1 1 1 1 1 1 1 1 1 

FQGVGIGAITDSAVS PVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 
250 260 270 280 290 300 
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orf 2 9ng-l .pep 



orf29-l 



310 320 330 340 350 360 

ARQWADAHPNITATAQTALAVAEAAGTWRGKKVELNPTKWDWVKNTGYKKPAARHMQTV 

I :, I I I I I I I I I I I I I I II- I I I I I I I I I II I I I I I I MM I II I I ' I I I I I I I I I I'- 
AKQWADAHPNITATAQTALSAAEAAGTVWRGKKVELNPTKWDWVKNTGYKKPAARHMQTL 
310 320 330 340 350 360 



35 



orf 29ng-l .pep 



orf29-l 



370 380 390 400 410 419 

DGEMAGGNRPPKS I - TSEGKANAATYPKLVNQLNEQNLNNI AAQDPRLSLAIHEGKKNFP 

IIMIIIMI |h :| - :: h :: : ::::: 

DGE^4AGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPD^CVKT 
370 380 390 400 410 420 



40 



420 430 440 450 460 470 479 

orf 29ng- 1 . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 



orf29-l 



RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 
430 440 450 460 470 480 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 173): 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

5 

This corresponds to the amino acid sequence (SEQ ID NO: 174; ORF30): 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 175): 

10 1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

15 251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

3 01 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 
351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGA AAATTTCCCC ATTATCATCG TCGAGTTACG 
451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

20 501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 1 76; ORF30-1 ): 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

25 101 PGGVGAAGKV VS FAKYGRE I KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N [meningitidis (strain A) 

ORF30 (SEQ ID NO: 174) shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) 
30 (SEQ ID NO: 1 78) from strain A of N. meningitidis: 

10 20 30 40 

or f 3 0 . pep MKKQ I TAAVMMLSM I APAMAN GLDNQAFEDQMFHTRADAPMQ 

N 1 1 M 1 1 1 1 i I II 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 : 1 1 M 1 1 ; 1 1 

orf 3 0a MKKQITAAVMMLSMIAPAMAN GLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAgl^ 
35 10 20 30 40 50 60 

orf 30a LX I LGGAA I GMW TQHGFSYATTGRP AS VRD VAIAGGLGAI PGXVGAAGKWS FAKYGRE I 

70 ' 80 90 100 110 120 



The complete length ORF30a nucleotide sequence (SEQ ID NO: 177) is: 



40 



1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 
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101 


ACACGCGGGC 


AGATGCACCG 


ATGCAGTTGG 


CGGAGCTTTC 


TCAAAAGGAG 


151 


ATGAAGGANA 


CAGNGGGGGC 


GTTTCTTCCA 


TTGGNTATCT 


TGGGTGGTGC 


201 


TGCCATTGGT 


ATGTGGACAC 


AGCATGGTTT 


TAGTTATGCA 


ACGACAGGCA 


251 


GACCAGCTTC 


TGTTAGAGAT 


GTTGCTATTG 


CTGGCGGATT 


AGGCGCAATT 


301 


CCTGGTGNTG 


TAGGCGCCGC 


AGGAAAGGTT 


GTTTCCTTTG 


CTAAATATGG 


351 


ACGTGAGATT 


AAAATCGGCA 


ATAATATGCG 


GATAGCCCCT 


TTCGGTAATA 


401 


GAACAGGTCA 


TCCTATTGGN 


AAATTTCCCC 


ATTATCATCG 


TCGAGTTACG 


451 


GATAATACGG 


GCAAGACTTT 


GCCTGGACAG 


GGAATTGGTC 


GTCATCGCCC 


501 


TTGGGAATCA 


AAATCTACGG 


ACAGATCATG 


GAAAAACCGC 


TTCTAA 


This encodes a 


protein having amino acid sequence (SEQ ID NO: 


178): 



10 



1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAI AGGLGAI 

101 PGXVGAAGKV VS FAKYGRE I KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

15 151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a (SEQ ID NO: 178) and ORF30-1 (SEQ ID NO: 176) show 97.8% identity in 181 aa 
overlap: 

or f 3 0a . pep MKKQ I TAAVMMLSM I APAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 6 0 

20 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 1 I I I I I I I 

or f 3 0 - 1 MKKQ I TAAVMMLSM I APAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 6 0 

or f 3 0a . pep LX I LGGAAI GMWTQHGFS YATTGRPASVRDVAI AGGLGAI PGXVGAAGKWS FAKYGRE I 12 0 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 M U II I I ! 1 1 M 1 1 1 1 M 1 1 M I 

or f 3 0 - 1 LAI LGGAAI GMWTQHGFS YATTGRPASVRDVAI AGGLGAI PGGVGAAGKWS FAKYGRE I 120 

25 orf 30a .pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 18 0 

1 1 1 1 1 1 M II I M M II 1 1 1 1 1 1 1 ! 1 1 II 1 1 1 1 ! 1 1 1 1 ! I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 30-1 KIGNNMRIAPFGNRTGHP I GKFPHYHRRVTDNTGKTLPGQGIGRHRPWES KSTDRSWKNR 180 

orf 3 0a. pep FX 
30 orf 30-1 FX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF30 (SEQ ID NO: 174) shows 97.6% identity over a 42aa overlap with a predicted ORF 
(ORF30.ng) (SEQ ID NO: 180) from N. gonorrhoeae: 

orf 30 .pep MKKQ I TAAVMMLSM I APAMANGLDNQAFEDQMFHTRADAPMQ 42 
35 | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | : | | | | | | | | | | 

orf 30ng MKKQ I TAAVMMLSM I APAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

The complete length ORF30ng nucleotide sequence (SEQ ID NO: 179) is 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATCGCCCC 

40 51 CGCAATGGCA AACGGATTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCGCCG ATGCAGTTGG CGGAGCTTTC TCAGAAGGAG 

151 ATGAAGGAGA CTGAAGGGGC TTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

201 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 
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251 GACCAGCTTC TGTTAGAGAT GTTGCTGGCG GATTAGGCGC AATTCCTGGT 

301 GATGTAGGTG CTGCAGGAAA GGTTGTTTCC TTTGCTAAAT ATGGACGTGA 

351 GATTAAAATC GGCAATAATA TGCGGATAGC CCCTTTCGGT AATAGAACAG 

401 GTCATCCTAT TGGAAAATTT CCCCATTATC ATCGTCGAGT TACGGATAAT 

5 451 ACGGGCAAGA CTTTGCCTGG ACAGGGAATT GGTCGTCATC GCCCTTGGGA 

501 ATCAAAATCT ACGGACAGAT CATGGAAAAA CCGCTTCTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 1 80): 

1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 

10 51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 

101 DVGAAGKWS FAKYGRE I KI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 

151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng (SEQ ID NO: 180) and ORF30-1 (SEQ ID NO: 176) show 98.3% identity in 181 aa 
1 5 overlap: 

10 20 30 40 50 60 

orf 3 Ong . pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 ii 1 1 1 1 ; I I I I I I I II I I I I I II M I I I I I I I I I I I I 

orf 3 0 - 1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
20 10 20 30 40 50 60 

70 80 90 100 110 

orf 3 Ong. pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA- - GGLGAI PGDVGAAGKWS FAKYGRE I 

IIIIIIIIIMIIIMIIIIMIIIIIIIIII IIIIIIM MIMI IIIIIIIM 

orf30-l LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 
25 70 80 90 100 110 120 

120 130 140 150 160 170 

orf 3 Ong . pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I M I I I I I I I I I I 
orf 3 0 - 1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 
30 130 140 15.0 160 170 180 

180 

orf 3 Ong. pep FX 
II 

orf 30-1 FX 



35 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N .gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 22 



40 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 181): 



1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 
51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 
101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 
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151 GCACCTGTTT GTg . CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 
201 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 
251 TT. . 

5 This corresponds to the amino acid sequence (SEQ ID NO: 1 82; ORF3 1): 

1 MNKTLYRVIF NRKRGAVXAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCXVTNIF SFSLLGFSLC LAVGTXNIAF ADGI . . 

Further work revealed a further partial nucleotide sequence (SEQ ID NO: 183): 

10 1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT. . 

15 

This corresponds to the amino acid sequence (SEQ ID NO: 184; ORF31-1): 

1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSDSGSAH VKSVPFGTTH 
51 APVCRSNIFS FSLLGFSLCL AVGTANIAFA DGI . . 

20 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .gonorrhoeae 

ORF31 (SEQ ID NO: 182) shows 76.2% identity over a 84aa overlap with a predicted ORF 
(ORF31 .ng) (SEQ ID NO: 1 86) from N. gonorrhoeae: 

orf 31 .pep MNKTLYRVI FNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNI F 60 

25 II Ml II II III I I II I I II II I II II II I II I llh: II II I II =: I 

O r f 3 1 ng MNKTLYRV I FNRKRGAWAVAETT KREGKS CADSGSGS VYVKS VS F I PTH SKAF 54 

' orf 31. pep SFSLLGFSLCLAVGTXNIAFADGI 84 

II MINIM MINIM 

orf 31ng CFSALGFSLCLALGTWIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length ORF31ng nucleotide sequence (SEQ ID NO: 185) is: 



30 



1 ATGAACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

35 151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

2 01 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 
251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 
301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

3 51 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 
40 4 01 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

4 51 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 
501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 
551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 
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601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 
651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 
701 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 
751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

5 

This encodes a protein having amino acid sequence (SEQ ID NO: 1 86): 



1 MNKTLYRVI F NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
10 151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 

201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
15 hemolysins-like HecA protein (SEQ ID NO: 1 125) from Erwinia chrysanthemi (accession number 
L39897): 

orf 3 lng 96 GNGIPQWIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

20 Orf31ng 155 ARVWNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A * + +N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANVWANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

Orf31ng 215 -AGDFSGFKIRQGNAVIAGHGLDARDTDF 242 
AG SG +R G+ +1 G .GLDA +D+ 
25 HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng (SEQ ID NO: 186) and ORF31-1 (SEQ ID NO: 184) show 79.5% identity 
in 83 aa overlap: 

10 20 30 40 50 60 

30 orf 3 1 - 1 . pep MNKTLYRVI FNRKRGAWAVAETTKREGKS CADSDSGSAHVKS VP FGTTHAPVCRSN I FS 

I M I I I I I I I I I I I I I I I I ! I M I ! I ! I I I I I |||::|||| III hi 

or f 3 1 ng MNKTLYRV I FNRKRGAWAVAETTKREGKS CADSGSGS VYVKSVS F I PTH SKAFC 

10 20 30 40 50 

70 80 
35 orf 31-1. pep FSLLGFSLCLAVGTANIAFADGI 

II lllllllhlhllllllll 
orf31ng FSALGFS LCLALGTVN I AF ADG 1 1 TDKAAPKTQQAT I LQTGNG I PQVN IQTPTS AGVSVN 

60 70 80 90 100 110 

40 On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 
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The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 187): 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 188; ORF32): 



1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

10 51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A.. 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 189): 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

15 101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

2 01 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

2 51 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 
301^ CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 

20 3 51 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

4 01 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 

451 TTGATACGCG AACGTGATTA CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 

501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

25 601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAATCATCGA 

651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

30 851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 

951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 

35 1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 190; ORF32-1): 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

40 101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAGS PMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

3 01 AFWDKAHGFY TPETVSAHRR LSDDLNGGEA LSATQRLECW QTLQQHQNGW 
45 351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 
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ORF32 (SEQ ID NO: 188) shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) 
(SEQ ID NO: 192) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 32 . pep MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 

70 80 
orf 32. pep CVHQD I HVRTWHSDAAD I DTA 

10 Illllllllllllllllllll 

orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 

The complete length ORF32a nucleotide sequence (SEQ ID NO: 191) is: 

15 1 ATGAATACTC CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

20 251 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

301 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 

3 51 CGCGGAGGAN AGCAATGAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

401 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 

451 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 

25 501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 

551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 

601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 

651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 

701 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 

30 751 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 

801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 

851 TTTGGCACAT CTACCCGCAA GATGAGAATG TCCATCTCGA CAAACTCCAC 

901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGCATCGGC 

951 ACACCGCCGC CTTTCAGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

35 1001 CACAACGCCT CGAATGTTGG CAAATCCTGC AACAACATCA AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCGGTTAT CTTTTTGGGC AGCCTTCCGC 

1101 ATCCGAAAAA CTCGCCGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This encodes a protein having amino acid sequence (SEQ ED NO: 192): 

40 . 1 MNTPPFSAGX FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVX CVHQDIHVRT WHSDAADIDT APVXDWIET FACDLPENVL 

101 HIIRRHKPLW LXWEYLSAEX SNERLHXMPS PQESVXKXFW FMGFSEXSGG 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD VWAKWLEMWR 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

45 251 PQQDFDKLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW QILQQHQNGW 

351 RQGAEDWSRY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a (SEQ ID NO: 192) and ORF32-1 (SEQ ID NO: 190) show 93.2% identity in 382 aa 



50 overlap: 
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10 20 30 40 50 60 

orf 32 - 1 . pep MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

llllll llllll IIIIMI Mil III IIIIIIIIIIIIIMIIIIIIIIIIII! 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 32 - 1 . pep CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHI I RRHKPLWLNWE YLS AEE 

I I I I I I II I I I I I I I I 1 I I I : I I I I I I I I I I I I I I I II I I I I I I I I I I I llllll 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHI IRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 32 - 1 . pep SNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERIiMLPEKNAS 

llllll IIIIIM I MINIM 1 1 1 1 1 M 1 1 M II I M M IIMIIIIIII 

orf 32a SNERLHXMPSPQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 
130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf32-l .pep EWLLFGYRSDVWAKWLEMWRQAGS PMTLLLAGTQ 1 1 DS LKQSGVI PQDALQNDGDVFQTA 

I II II I Ml II I 1 1 M 1 1 1 1 1 1 Ml 1 1 1 1 M 1 1 1 1 MM II 1 1 II I M 1 1 1 II I M 

orf 3 2a EWLLFGYRSDVWAKWLEMWRQAGS PLTLLLAGAX 1 1 DSLKQNGV I PQDALQNDGDVFQTA 

190 200 210 220 230 240 



25 



250 260 270 280 290 300 

orf 32 - 1 . pep STOLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

II II 1 1 M 1 1 1 M M M 1 1 1 1 1 1 1! 1 1 1 1 1 II I M 1 1 1 1 1 1 1 1 1 M I Ml 1 1 1 1 1 1 M 

orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
250 260 270 280 290 300 



30 



310 320 330 340 350 360 

or f 3 2 - 1 . pep AFWDKAHGFYTPETVS AHRRLSDDLNGGEALS ATQRLECWQTLQQHQNGWRQGAEDWSRY 

llllll llllll Mlllllllll MMM MMMM III MIMMIMM 

or f 3 2 a AFWDKAHGFYTPETASAHRRLSDDLNGGEALS ATQRLECWQ I LQQHQNGWRQG AEDWS RY 

310 320 330 340 350 360 



35 



370 380 
orf 32-1 .pep LFGQPSAPEKLAAFVSKHQKIRX 

IIIIIM I I I 1 I 1 I ! I I I I I I I 
orf 32a LFGQPSASEKLAAFVSKHQKIRX 
370 380 



Homology with a predicted ORF from N. gonorrhoeae 

ORF32 (SEQ ID NO: 188) shows 95.1% identity over a 82aa overlap with a predicted ORF 
(ORF32.ng) (SEQ ID NO: 194) from N. gonorrhoeae: 



40 



45 



orf 32 . pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

Ml I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

or f 3 2ng MVMNTYAFPVCWI FCKV I DNFGD I GVS WRLARVLHRELGWQVHLWTDDVS ALRALCPDLP 6 0 

orf 32. pep DVPCVHQD I HVRTWHSDAAD I DTA 81 

III MMM MMMIIMM 

orf 32ng DVPFVHQD I HVRTWHSDAAD I DTAPVPDAV I ETFACDLPENVLN 1 1 RRHKPLWLNWE YLS 120 
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10 



An ORF32ng nucleotide sequence (SEQ ID NO: 193) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 194): 

1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEESNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVLPEK NAPEWLLFGY RGD VWAKWLD 

201 MWQQAGSLMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence (SEQ ID NO: 195): 

1 ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGTCATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

15 151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC ATCAGGATAT 

2 01 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 
251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

3 01 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 
351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

20 4 01 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

4 51 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 
501 AGCCCTGCGC CGGCGGCTGG TGCTGCCCGA AAAAAACGCC CCCGAATGGC 
551 TGCTTTTCGG CTATCGGGGC GATGTTTGGG CAAAGTGGCT GGACATGTGG 
601 CAACAGGCAG GCAGCCTGAT GACCCTACTG CTGGCGGGGG CGCAAATTAT 

25 651 CGACAGCCTC AAACAAAGCG GCGTTATTCC GCAAAACGCC CTGCAAAAtg 

701 aaggcgGTGT CTTTCagacG gcatccgTcC gccttGTCAA AAtcCCGTTC 

751 GTGCcGCAAC AGGAcTTCGA CAAATTGCTG CAcctcgcCG ACTGCGCCGT 

801 GATACGCGGC GAAGACAGTT TCGTGCGTAC CCAGCTTGCC GGAAAACCCT 

851 TTTTTTGGCA CATCTACCCG CAAGACGAGA ATGTCCATCT CGACAAACTC 

30 901 CACGCCTTTT GGGATAAGGC ATACGGCTTC TACACGCCCG AAACCGCATC 

951 GGTGCACCGC CTCCTTTCGG ACGACCTCAA CGGCGGAGAG GCTTTATCCG 

1001 CAACACAACG CCTCGAATGT TGGCAAACCC TGCAACAACA TCAAAACGGC 

- 1051 TGGCGGCAAG GCGCGGAGGA TTGGAGCCGT TATCTTTTCG GGCAGCCTTC 

1101 CGCATCCGAA AAACTCGCCG CCTTTGTTTC AAAGCATCAA AAAATACGCT 

35 1151 AG 

This encodes a protein having amino acid sequence (SEQ ID NO: 196; ORF32ng-l): 



1 MNTYAFPVCW IFCKVIDNFG DIGVSWRLAR VLHRELGWQV HLWTDDVSAL 

51 RALCPDLPDV PFVHQDIHVR TWHSDAADID TAPVPDAVIE TFACDLPENV 

40 101 LNIIRRHKPL WLNWEYLSAE ESNERLHLMP SPQEGVQKYF WFMGFSEKSG 

151 GLIRERDYRE AVRFDTEALR RRLVLPEKNA PEWLLFGYRG DVWAKWLDMW 

2 01 QQAGSLMTLL LAGAQIIDSL KQSGVIPQNA LQNEGGVFQT ASVRLVKIPF 

251 VPQQDFDKLL HLADCAVIRG EDSFVRTQLA GKPFFWHIYP QDENVHLDKL 

301 HAFWDKAYGF YTPETASVHR LLSDDLNGGE ALSATQRLEC WQTLQQHQNG 

45 3 51 WRQGAEDWSR YLFGQPSASE KLAAFVSKHQ KIR* 

ORF32ng-l (SEQ ID NO: 196) and ORF32-1 (SEQ ID NO: 190) show 93.5% identity in 383 aa 
overlap: 



10 20 30 40 50 59 

50 orf 32 - 1 . pep MNTPPF- VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
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orf 32ng-l 

5 orf 32-1. pep 

orf 32ng-l 

10 orf 32-1. pep 

orf 32ng-l 

15 orf32-l.pep 
orf 32ng-l 

20 orf 32-1. pep 

orf 32ng-l 

25 orf32-l.pep 
orf 32ng-l 

30 orf 32-1. pep 

orf 32ng-l 

370 380 

On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, it 
35 is predicted that the proteins from N. meningitidis and N. gonorrhoeae , and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (SEQ ID NO: 190) (42kDa) was cloned in pET and pGex vectors and expressed in 
Kcoli 9 as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 7A shows the results of affinity purification of the His-fusion protein, and 
40 Figure 7B shows the results of expression of the GST-fusion in E.coli. Purified His-fusion protein 
was used to immunise mice, whose sera were used for ELISA, giving a positive result. These 



III I IIIMUIIIIIIHMIIII IIIIIIMIIIIIIMIIIIMIMII III 

MNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
10 20 30 40 50 60 

60 70 80 90 100 110 119 

PCVHQD I HVRTWHSDAAD IDTAPVPDWI ETFACDLPENVLH 1 1 RRHKPLWLNWEYLS AE 

I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAWFDTEALRERLMLPEKNA 

I I II M 1 1 1 1 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 1 1 ! I II 1 1 1 1 1 llllllllllhlhllllll 

ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 
130 140 150 160 170 180 

180 190 200 210 220 230 239 

S EWLLFG YRSDVWAKWLEMWRQAGS PMTLLLAGTQ I IDSLKQSGVI PQDALQNDGDVFQT 

Mlllllhllllllhlhllll llilllMUIIIII llllh MM 1 1 1 1 

PEWLLFGYRGDWAKWLDMWQQAGSLMTLLLAGAQI IDSLKQSGVI PQNALQNEGGVFQT 
190 200 210 220 230 240 

240 250 260 270 280 290 299 

ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 

1 1 1 1 1 1 1 1 1 1 M 1 1 M : 1 1 1 1 1 1 1 ! I 1 1 M 1 1 i U I M 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

AS VRLVKI P FVPQQDFDKLLHLADCAV I RGEDS FVRTQLAGKPFFWH I YPQDENVHLDKL 
250 260 270 280 290 300 

300 310 320 330 340 350 359 

HAFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 

IIIMIhllllllhhIl Ml II II I II II II I II 1 1 II I II II III II II III II I 

HAFWDKAYGFYTPETASVHRLLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
310 320 330 340 350 360 

360 370 380 

YLFGQPSAPEKLAAFVSKHQKIRX 

Illlllll MIMMIIIIMM 

YLFGQPSASEKLAAFVSKHQKIRX 
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experiments confirm that ORF32-1 (SEQ ID NO: 190) is a surface-exposed protein, and that it is 
useful immunogen. 



Example 24 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 197): 



1 . . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA . ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

201 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

251 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

4 01 TCGCCTGCTA NGGCATCCTG CCGCGCCTG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 198; ORF33): 



1 . . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL . . 

Further work revealed the complete nucleotide sequence (SEQ ED NO: 199): 



1 


ATGTTGAATC 


CATCCCGAAA 


51 


AGGCGGTTTT 


ATTTTCAGCG 


101 


GCCGCGTGGA 


CGGCAGTACG 


151 


ATTGACAGGA 


ACCGTATGCT 


201 


GTCGTTCTGG 


TTGTGGGTGG 


251 


TTTCAGTCAC 


TTATCTTCTA 


301 


GTTTTGGCGG 


GCGTGTTGGG 


351 


GGCAATGTTG 


TTCCTGCGTG 


401 


CGACGTGGTT 


TCGGGGCAAA 


451 


TATGCGGACG 


AGTGGCGGCA 


501 


GTCGCACAGC 


CTGTGGCTCT 


551 


TGTTGCTGCT 


TTTGGTGCGG 


601 


TTGAGCAATG 


CCGCTTCGGT 


651 


GTCGAAACTC 


GGTTTCCCTG 


701 


GTCTGAACGG 


CAATATTGCC 


751 


GGCAGTATCG 


CCTGCTACGG 


801 


GTGTAAAATC 


CTTTTGAAAA 


851 


CCTATTATCA 


GGCGGTCATC 


901 


GATACGCGTC 


GGGAAACCGT 


951 


CGATGCGCCG 


AAATGGGCGG 


1001 


AATGGTTCGA 


GGGCAGGCTG 


1051 


ACCAATCGGG 


AACAGGTTGC 


1101 


GGCGCAACTG 


CTTATCGGCG 


1151 


TGTTGCGGCA 


GATTGTCCGA 


1201 


GTGCAGCTTT 


TGGCGGAACA 


1251 


GGAACATTGG 


CGTAACGCGC 


1301 


CTGACAGGGC 


GGCGCAGGAA 



ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 
GCGATCCCGT ACAGGCGACG GAGGCTTTGC 
GAGGAAAAAA TCATCCGTCG GGCGGAGATG 
GCGGGAGACG TTGGAACGTG TGCGTGCGGG 
TGGCGGCGAC GTTTGCATTT TTTACCGGTT 
ATGGACAATC AGGGTCTGAA TTTCTTTTTG 
CATGAATACG CTGATGCTGG CAGTATGGTT 
TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 
GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 
ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 
GCACGCTGCT CGGAATGCTG GTGTCGGTAT 
CAATATACGT TCAACTGGGA AAGCACGCTG 
ACGCGCGGTG GAAATGTTGG CATGGCTGCC 
TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 
GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 
CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 
CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 
CGCCGCTGGC AGAACAAAAT CACCGATGCG 
GTCCGCCGTT TCACCGAAAA TCATCTTGAA 
TCATGCTGGA GACCGAGTGG CAGGACGGCG 
GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 
CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 
TGCGCGCCCA AACTGTGCCG GACCGCGGCG 
CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 
GGGGCTTTCA GACGACCTTT CGGAAAAGCT 
TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 
GGGCGTTTGA AAGACCAATA A 



This corresponds to the amino acid sequence (SEQ ID NO: 200; ORF33-1): 
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1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFSV TYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

5 201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 

2 51 GSIACYGILP RLLAW WCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 
301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

3 51 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

4 01 VQLLAEQGLS DDLSEKLEHW RNALAECGAA WLEPDRAAQE GRLKDQ* 

10 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N .meningitidis (strain A) 



ORF33 (SEQ ID NO: 198) shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) 
(SEQ ID NO: 202) from strain A of N. meningitidis: 



15 10 20 30 

orf 33 . pep L FLRVKVGRF FS S P ATWFRX KD P VNQA VLR 

I II 1 1 1 1 II 1 1 1 1 1 II ! I MINIMI 

orf 33a LMDNQGLNF FLVLAGVXGMNTLMIAW LAMLFLRVKVGRFFSSPATWFRGKDPVNQAVLR 
90 100 110 120 130 140 

20 40 50 ' 60 70 80 90 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLVR QYTFNWESTLLSNAASVRA 

II Mill llllll M M I I I I I I I I I I I I I I I II I I 1 I I I I I M I I I I M- M I I 
orf 3 3a LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLVR QYTFNWESTLLGDSSSVRL 

150 160 170 180 190 200 

25 100 110 120 130 140 

or f 3 3 . pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 

I I I M M : I M I i MM: 1 M 1 1 1 M 1 1 1 1 M 1 1 Ml MINI 

orf 3 3a VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLAW AVCK 
210 220 230 240 250 260 

30 orf 33a I LXXTS ENGLDLEKXXXXXX I RRWQNKI TDADTRRETVS AVS PKI VLNDAPKWAVMLETE 

270 280 290 300 310 320 

The complete length ORF33a nucleotide sequence (SEQ ID NO: 201) is: 

1 ATGTTGAATC CATC CCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 

35 51 AGGCGGCTTT ATTTTCAGCG GCGATCCCGT GCAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGAAGATG 

151 ATCGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

2 01 GTCGTTCTGG TTGTGGGTGG CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 

2 51 TTTCAGTTAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 
40 301 GTTTTGGCGG GCGTGNTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

3 51 GGCAATGTTG TTCCTGCGCG TGAAAGTGGG GCGTTTTTTC AGCAGTGCGG 

4 01 CGACGTGGTT TCGGGGCAAA GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 
451 TATGCGGACG AGTGGCGGCN ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 
501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

45 551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGGGCGATT CGTCTTCGGT ACGGCTGGTG GAAATGTTGG CATGGCTGCC 

651 TGCGAAACTG GGTTTTCCCG TGCCTGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 
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751 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 

801 ATGCAAAATC CTTNTGNAAA CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 

851 NCNNNNNTCN NNCGNTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 

5 951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAATGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

10 1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 202): 

15 1 MLNPSRKLVE LVRILEEGGF IFSGDPVQAT EALRRVDGST EEKI IRRAKM 

51 IDRNRMLRET LERVRAGS FW LWVAAATFAF XTXFS VTYLL MDNQGLNFFL 

101 VLAGVXGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRXPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LGDSSSVRLV EMLAWLPAKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

20 2 51 GSIACYGILP RLLA WAVCKI LXXTSENGLD LEKXXXXXXI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

4 01 VXLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRAAQE GRLKTNDRT* 

25 ORF33a (SEQ ID NO: 202) and ORF33-1 (SEQ ID NO: 200) show 94.1% identity in 444 aa 
overlap: 



10 20 30 40 50 60 

orf 33a .pep ' MLNP S RKLVELVR I LE EGG F I FSGDPVQATEALRRVDGSTEEKI IRRAKM IDRNRMLRET 

II M 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 M I M 1 1 II M I II 1 1 1 II M 1 1 1 1 M 1 1 II 1 1 1 1 1 1 

30 orf33-l MLNPSRKLVELVRILDEGGF I FSGDPVQATEALRRVDGSTEEKI I RRAEM IDRNRMLRET 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMLAVWLAML 

I II 1 1 1 1 1 1 M M 1 1 1 1 1 I II MIMIMMIMMM llllllllllllll 

35 orf 33 - 1 LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3 3a . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHSLWLCTLLGML 

Mill MMII IIIIIIMIIMI MIIIMI MMMMIIMM IIMM 

40 orf 33 - 1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHS LWLCTLLGML 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 3 3a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAV IEGRLNGNIA 

Mill IMMIMIMM M-MM I M 1 1 1 1 1 M 1 1 1 1 1 1 I Ml I Ml II 1 1 1 

45 orf 33- 1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAV IEGRLNGNIA 

190 200 210 ' 220 230 240 



250 260 270 280 290 300 

orf 3 3a. pep DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXI RRWQNKITDA 

I I I I I I I I I I I I I I I I I I I I II I I h I I I I I llllllllll I I I I I I I I I I I 

50 orf33-l DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 3 3a . pep DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 

M 1 1 1 1 1 1 1: 1 1 hi 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M I II II 1 1 1 M hi II 1 1 1 i 1 1 

orf 33 - 1 DTRRETVSAVS PKI ILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

5 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 33a . pep ' TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 

1 1 II II II II 1 1 1 Mill IIIMIII III MINI Mill I II III MM II III 1 1 II 

orf 33 - 1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 
10 370 380 390 400 410 420 

430 440 450 

or f 3 3 a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

I llhhh llllllllhll 
orf33-l RNALAECGAAWLEPDRAAQEGRLKDQX 
15 430 440 

Homology with a predicted ORF from N. gonorrhoeae 

ORF33 (SEQ ID NO: 198) shows 91.6% identity over a 143aa overlap with a predicted ORF 
(ORF33.ng) (SEQ ID NO: 204) from N. gonorrhoeae: 

. orf 33. pep LFLRVKVGRFFSSPATWFRXKDPVNQAVLR 30 

20 II MM I Ml III II III I I I Mill II 

orf 3 3ng LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 100 

orf 33 . pep LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 

II MM IMIII hhlllllll IllllliUlllilh llllllll III 

orf33ng LYADQWRQPSVRWKIGATAHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 160 

25 orf 33 .pep ' VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 143 

1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 h I h 1 1 MM IMIII 

orf33ng VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 220 

An ORF33ng nucleotide sequence (SEQ ID NO: 203) was predicted to encode a protein having 
30 amino acid sequence (SEQ ID NO: 204): 

1 MIDRDRMLRD TLERVRAGS F WLWWVASMM FTAGFS GTYL LMDNQGLNFF 

51 LVLAGVLGMN TLMLAVW LAT LFLRVKVGRF FSSPATWFRG KGPVNQAVLR 

101 LYADQWRQPS VRWKIGATAH SLW LCTLLGM LVSVLLLLLV RQYTFNWEST 

151 LLSNAASVRA VEMLA WLPSK LGFPVPDARA VIEGRLNGNI ADARAWSGLL 

35 201 VGSIVCYGIL PRLLAWWCK ILLKTSENGL DLEKTYYQAV IRRWQNKITD 

2 51 ADTRRETVSA VSPKIVLNDA PKWALMLETE WQDGQWFEGR LAQEWLDKGV 

301 AANREQVAAL ETELKQKPAQ LLIGVRAQTV PDRGVLRQIV RLSEAAQGGA 

351 WQLLAEQGL SDDLSEKLEH WRNALTECGA AWLEPDRVAQ EGRLKDQ* 

40 Further sequence analysis revealed the following DNA sequence (SEQ ID NO: 205): 



1 ATGTTGaatC CATCCCgaAA ACTGgttgag ctGgTCCgtA Ttttgaataa 

51 agggggtTTT attttcagcg gcgatcctgt gcaggcgacg gaggctttgc 

101 gccgcgtgga cggcAGTACG GAggAaaaaa tcttccgtcg GGCGGAGAtg 

151 atcgACAGGg accgtatgtt gcgggACaCg TtggaacGTG TGCGTGCggg 
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-201- 

201 gtcgtTctgG TTATGGGTGG TggtggCAtC gATGATGTtt aCCGCCGGAT 

251 TTTCAGgcac ttatCttCTG ATGGACaatC AGGGGCtGAA TtTCTTTTTA 

301 GTTTTggcgG GAGTGTtggG CATGaatacG ctgATGCTGG CAGTATGGtt 

351 gGCAACGTTG TTCCTGCGCG TGAAAGTGGG ACGGTTTTTC AGCAGTCCGG 

5 401 CGACGTGGTT TCGGGGCAAA GGCCCTGTAA ATCAGGCGGT GTTGCGGCTG 

4 51 TATGCGGACC AGTGGCGGCA ACCTTCGGTA CGATGGAAAA TAGGCGCAAC 

501 GGCGCACAGC TTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGCTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

10 651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGTC 

701 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG TCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGattgGAT TTGGAAAAAA 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

15 901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGC TCATGCTGGA GACCGAGTGG CAGGACGGCC 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TACGCGCCCA AACTGTGCCG GACCGGGGCG 

20 1151 TGCTGCGGCA GATTGTGCGG CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGT GGCGCAGGAA GGCCGTTTGA AAGACCAATA A 

25 This encodes a protein having amino acid sequence (SEQ ID NO: 206; ORF33ng-l ): 

1 MLNPSRKLVE LVR I LNKGGF IFSGDPVQAT EALRRVDGST EE K I FRRAEM 
51 I DRDRMLRDT LERVRAGS FW LWVWASMMF TAGFS GTYLL MDNQGLNFFL 
101 VLAGVLGMNT LMLAVW LATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 
151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 
30 201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 

251 GSIVCYGILP RLLA WWCKI LLKTSENGLD LEKTYYQAVI RRWQNKI TDA 
301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 
351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 
401 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE GRLKDQ* 

35 

ORF33ng-l (SEQ ID NO: 206) and ORF33-1 (SEQ ID NO: 200) show 94.6% identity in 446 aa 
overlap: 

10 20 30 40 50 60 

orf 33 - 1 . pep MLNPS RKLVELVR I LDEGGF I FSGDPVQATEALRRVDGSTEEKI I RRAEMIDRNRMLRET 

40 | | | | | | M | | | | | | | : : | | M | M I II I I I I I I I I I I I I I I I I h I I I I I I I h I I I h I 

orf 33ng-l MLNPS RKLVELVR I LNKGGF I FSGDPVQATEALRRVDGSTEEKI FRRAEM I DRDRMLRDT 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 33 - 1 . pep LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 

45 Illllllllllllhl:: I :|ll Mill Mill II llllllll I MM II Mill I 

or f 3 3 ng - 1 LERVRAGS FWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 33-1. pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
50 || | | | | || || | | || || || | || | || || | | || | | : | | | | | || || I II h M M II II I II I 

or f 3 3 ng - 1 FLRVKVGRFFSS PATWFRGKGPVNQAVLRLYADQWRQPS VRWKIGATAHS LWLCTLLGML 

130 140 150 160 170 180 
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190 200 210 220 230 240 

orf 33 - 1 . pep VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

IIIMIIIIIIIIIMIIMMIMIIIII lllllllllll MM IMIIIMI! 

orf 33ng-l VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

190 200 210 220 230 240 



10 



250 260 270 280 290 300 

orf 33 - 1 . pep DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

1 1 H 1 1 1 M M M 1 1 1 1 1 M M 1 1 1 M I I M 1 1 1 1 1 1 llllllllllllllll 

orf33ng-l DARAWSGLLVGSIVCYGILPRLLAWWCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 290 300 



15 



310 320 330 340 350 360 

or f 3 3 - 1 . pep DTRRETVSAVS PKI I LNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 

Mill IMIIIIIMII llllhllllllllMIIMIIMMIIIIMMIMII 

or f 3 3ng- 1 DTRRETVSAVS PKI VLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 



20 



370 380 390 400 410 420 

orf 33 - 1 . pep TELKQKPAQLL I GVRAQTVPDRGVLRQ I VRLS EAAQGGAWQLLAEQGLSDDLS EKLEHW 

I II II I II II Mill I II II II Mlllll I MM I lllllllllll III I UN I Mill 

orf 33ng-l TELKQKPAQLL I GVRAQTVPDRGVLRQ I VRLS EAAQGGAWQLLAEQGLSDDLS EKLEHW 

370 380 390 400 410 420 



25 



430 440 
orf 3 3 - 1 . pep RNALAECGAAWLEPDRAAQEGRLKDQX 

I I l.hl MM III I I hi I I Ml II II 
orf33ng-l RNALTECGAAWLE PDRVAQEGRLKDQX 

430 440 



Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 25 

30 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 207): 



35 



40 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



. CAGAAGAGTT 
CGGGGTGTCC 
CCTGTTTTTC 
GGCAGTACGG 
CGTCCGGCTG 
CCCGGTTTTT 
TCTGTGCCGT 
GGGTTGGGCG 
GTTTCGCGGG 
GTCC . . 



TGTCGAGAAT 
GGTCTGGTAT 
GGGTGTTTCT 
GGGTTTCTTT 
CCTGTCGGTT 
CTTGGGTGCG 
CCGGCTGTGC 
GCATCTTGTT 
GGCTGTCGGT 



TTCTTTATGG 
GGTTTTCTTT 
TTTCGGGGTT 
GAGTGTGTTT 
TGAGCTGTGT 
GCAGGGGACG 
GGGTTCGGAT 
CCGACTACGC 
GTGTTGCGGT 



GGTTTGGGCG 
GGGCGTTTCT 
CGGGACGGGG 
TCAGCTTGTG 
CGGCAGGTTG 
TCATTCTCCT 
GAGGCGGCGT 
CGTTTGGCAG 
TCGGCTTGAA 



GCGTGTTTTT 
TT . GAGTGCG 
GACGTTTGTG 
TTCC . GGCGT 
CG . . GTTTGA 
GCCGCTTTCG 
GGTGGTGTTC 
CCAGAATTCG 
GGGTTTTGTC 



This corresponds to the amino acid sequence (SEQ ID NO: 208; ORF34): 



1 . .QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 
51 GSTGVSLSVF SACVXGWRL PVGLSCVGRL XXLTRFFLGA AGDVILLPLS 
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101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 
151 S.. 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 209): 

5 1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

2 01 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 
10 251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

3 01 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 
351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 
451 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 

15 501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 

551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 

601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 

651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 

701 CGTTGGACGT AGTTTTGGTA GAGGGTGATG ACTTTTTGTA CGCCGACGGT 

20 751 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

801 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 

851 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 

901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 

951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

25 1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT ' 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

30 1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT . 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

1351 CATGCCGTCT GA 

This corresponds to the amino acid sequence (SEQ ED NO: 210; ORF34-1): 



35 1 MMMPFIMLPW IAGVPAVPGQ NRLSRISLWG LGGVFFGVSG LVWFSLGVSL 

51 GCACFSGV SF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 

201 SLKGLFGFFA ILIVLL GCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 

40 251 GADFLGNLRL FFGGEDAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 

301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 

3 51 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNVWGLR AGGSAVDGGF 

4 01 RADGGASDYC ADAAAKGKAE NGGNQGADGV RFGFHRVLPF LGVSDGIALR 
451 HAV* 



45 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF34 (SEQ ID NO: 208) shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) 
(SEQ ID NO: 212) from strain A of N. meningitidis: 
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orf 34 .pep 
orf34a 



10 20 30 

QKSLSR I SLWGLGGVFFGVSGLVW FSLG VSXE - 

II 



-CAC 



III lllllll I IIIIIIIIIIIMM III 

MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 
10 20 30 40 50 60 



40 50 60 70 80 90 
orf 34 . pep FSGV S FRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

I ! I I I I I II I I I I I I I I I I I I H I I I : h: :|:: III I II 

orf 34a FSGV SFRGSGRG TFVGSTGVSLSVFSACA PASSGCLSVXAVSAGCGLTRXFXGA 

70 80 90 100 110 



100 110 120 130 140 150 

orf 34 . pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 

III lllllllllllhll I llllllllllllllllllllllllllllh I I I I 
orf 34a AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 



orf 34. pep S 



orf 34a PFGXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence (SEQ ID NO: 21 1) is: 



1 


ATGATGATNC 


CGTTNATAAT 


51 


GCCGGGTCAG 


AAGAGGTTGT 


101 


TGTTTTTCGG 


GGTGTCCGGT 


151 


TCTTTGGGTG 


TTTCTNTGGG 


201 


GGGTTCGGGA 


CGGGGGACGT 


251 


TGTTTTCAGC 


TTGTGCTCCG 


301 


GTGTCGGCAG 


GTTGCGGTTT 


351 


CGGCAGTCCG 


CTGCCGCTTT 


401 


ATGAGGAGGC 


GTNGTNGTGT 


451 


CCGTTTGGCA 


GCCAGAATTC 


501 


TTCGGTNTGG 


AGGGTTTTGT 


551 


CTATTGCCAA 


TGCGCCGATG 


601 


ATCAGGAGTT 


TGGGGGTCAG 


651 


TTTGATTGTG 


CTTTTGGGGT 


701 


ACGGCATTGC 


CGAGTCAGCG 


751 


TTTTTGTACG 


CCGACGGTGG 


801 


CTTCGGGGGT 


GAGGATGCCC 


851 


ATTTTGACGC 


GCGCCTGTGT 


901 


GACTTTGGAT 


GTGTTCCAAG 


951 


GCAGGGAGGC 


GACGGTAATG 


1001 


GAACGTGCAA 


TCTGACCGAC 


1051 ■ 


TCCGAGCAGC 


AGCAGGTGGC 


1101 


TGTANCCTTT 


GGTTTGGTTG 


1151 


TCGATACGCA 


GCGCCATTAC 


1201 


GCGGTCGACG 


GCGGATTTCG 


1251 


TGACGCAGCC 


GCCGAGGGCA 


1301 


ACGGTGTGCG 


GTTTGGGTTT 


1351 


GACGGCATTG 


CTTTGCGCCA 



This encodes a protein having amino 



GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 
CGAGAANTTC TTTATGGGGT TTAGGCGGCN 
TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 
CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 
TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 
GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 
GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 
CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 
TCGGGTTGGG CGGCATCTTG TCCGACTACG 
GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 
CNCCGTTCGG GTNGAATGTG CTGACGATGC 
GCGGTGATAC AGATGAGCAA TACGGCGCGT 
CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 
GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 
TTGGACGTAG TTTNGGTAGA GGGTGATGAC 
TGCTGACTTT TTGGGTAATC TGCGCCTGTT 
ATAACGTAGG TTACGTTGCC GTAGGTAACG 
GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 
TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 
TANTTGTACA CGCCTTCGGC GGCCTGTTCG 
GAACTGTTTC TCGCCTTCGG TGGCGACTTG 
GGTTGTAGCC GACAACGGAG ATTTGGGGCG 
TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 
GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 
CGCCGACCGC CGCGCCGCCG ACGACTGCGC 
AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 
CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 
TGCCGTCTGA 

I sequence (SEQ ID NO: 212): 



1 MMXPXIMLPW IAGVPAVPGQ KRLSRXSLWG LGGXFFGVSG LVWFSLGVSX 
51 SLGVSXGCAC FSGVSFRGSG RGTFVGSTGV SLSVFSACAP ASSGCLSVXA 
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101 VSAGCGLTRX FXGAAGDGSP LPLSSVPSGC AGADEEAXXC SGWAASCPTT 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGXNV LTMPIANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFXFFAILIV LL GCRAMPSE GGSDGIAESA LDWXVEGDD 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYVA VGNDFDARLC GGADAQQRGA 

301 DFGCVPSVAG DVAGSARQGG DGNVXVHAFG GLFGTCNLTD ELFLAFGGDL 

351 SEQQQVAWA DNGDLGR VXF GLWLAQIGA GGGF DTQRHY WVGXRAGGS 

4 01 AVDGGFRADR RAADDCADAA AEGKAEDGGS QGADGVRFGF HRVLPFLGVS 

4 51 DGIALRHAV* 



10 ORF34a (SEQ ID NO: 212) and ORF34-1 (SEQ ID NO: 210) show 91.3% identity in 459 aa 
overlap: 



10 20 30 40 50 60 

orf 34a . pep MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLVWFSLGVSXSLGVSXGCAC 

II I lllllll IIMMMII lllllll IIIIMMIIIIMI MM 

1 5 orf 34 - 1 MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 

10 20 30 40 50 



70 80 90 100 110 120 

orf 34a . pep FSGVS FRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRXFXGAAGDGSP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ( 1 1 1 1 1 1 1 1 1 = 1 1 1 1 r 1 1 1 i 1 1 1 1 1 1 1 1 1 1 e i ilium 

20 orf 34 - 1 FSGVS FRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 

60 70 80 90 100 110 



130 140 150 160 170 180 

orf 34a . pep LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 

I I I I I I I I I I I E : I I I i I I I I I I M I I I I I I M I I II I I I I I I : IIMII || 
25 orf 34 - 1 LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 

120 130 140 150 160 170 



30 



190 200 210 220 230 240 

orf 34a. pep LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 

1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 , 1 1 1 1 1 M 1 1 1 1 1 

orf 34-1 LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 



35 



250 260 270 280 290 300 

or f 3 4 a . pep LD WXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 

MM I II 1 1 1 M II M il M MM II I II M 1 1 1 1 : MM II II ■ 1 1 M 1 1 1 II 1 1 II 

or f 3 4 - 1 LD WLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 

240 250 260 270 280 290 



310 320 330 340 350 360 

orf 34a . pep DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 

MMIIIMI MIMIMMIM M M 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 II I M I II 1 1 1 1 

40 orf 34 - 1 DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

300 310 320 330 340 350 



370 380 390 400 410 420 

orf 34a . pep DNGDLGRVXFGLWLAQ I GAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 

Ml MM M 1 1 1 1 1 1 Ml I M I M MM M III Mill 1 1 hi II Ml 

45 orf 34 - 1 DDGDLGRVAFGLWLAQIGTGGGFDTQRHNWVGLRAGGSAVDGGFRADGGASDYCADAA 

360 370 380 390 400 410 



430 440 450 460 

orf 34a . pep AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 

Ml I MM M M I II M I Ml 1 1 1 1 1 1 1 M 1 1 1 1 II 1 1 1 

50 orf 34 - 1 AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
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420 430 440 450 

Homology with a predicted ORF from N. gonorrhoeae 

ORF34 (SEQ ID NO: 208) shows 77.6% identity over a 161aa overlap with a predicted ORF 
(ORF34.ng) (SEQ ID NO: 214) from N. gonorrhoeae: 

5 orf34 pep QKSLSRI SLWGLGGVFFGVSGLVWFSLGVSXE CAC 35 

II llllllllhlllllllllllllllll III 

orf 34ng MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 60 

orf 34 . pep FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV GRLXXLTRFFLGA 90 

MINIMI hllllllllllllllll MM I : II II Mil 

10 or f 3 4 ng FSGVS FRGSGWGAFVGSTGVSLSVFSACVP VPVNESAARAASEGR - - GLTRFFLGA 114 

orf 34 . pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

III II I M 1 1 1 M I II 1 1 II M 1 1 1 M MM 1 1 1 1 M II 1 1 1 1 1 1 i 1 1 M 1 1 II 

orf 34ng AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 



15 



orf 34. pep S 175 
orf34ng PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 

The complete length ORF34ng nucleotide sequence (SEQ ID NO; 213) is: 



1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT ' TTGGCCGGCG 

20 101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 

151 TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 

3 01 GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 
25 3 51 CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 

4 01 ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG 
4 51 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 
501 TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 
551 CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

30 601 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 

751 TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 

35 851 ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 

901 GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 

40 1101 TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 

1151 TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 

1201 gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 

1251 TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 

1301 ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 

45 1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 214): 
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10 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MMMPFIMLPW IAGVPA VPGO 
SLGVSLGCAC FSGV SFRGSG 
ASEGRGLTRF FLGAAGDGSP 
PFGSQNSVSR GLSVCCGSVW 
IRSLGVSLKG LFGFFAILIV 



FLYADGGADF LGNLRLFFGG 
DFGRVPSVAG DVARSARQGG 
SEQQQVAWA DDGDLGR VAF 
AVDDGFCADG GPADDCAEAA 
DGIALRHAV* 



KRLSR ISLWG LAGVFFGVSG 
WG AFVGSTGV SLSVFSACV P 
LPLSSVPSGC AGSDEAAWWC 
RVLSPFGLNV LTMPTANAPM 
LLGCRAMPSE GGSDGIAESA 
EDAHNVGYIA VGNDFDARLC 
DGNVWYAFG GLFGTCNLTD 
GLWLAQVGT GGGFDTQRHN 



AEGKAEDGGN QGADGVWFGF 



LVWFSLG VSF 
VPVNESAARA 
SGWAASCPTA 
AVIQMSNTAR 
LDWLVEGND 
SGADAQQRGA 
ELFFAFGGDL 
WIGLRAGGS 
HRGLPFLGVS 



ORF34ng (SEQ ID NO: 214) and ORF34-1 (SEQ ID NO: 210) show 90.0% identity in 459 aa 
overlap: 
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10 20 30 40 4 50 

orf 34 - 1 . pep MMMPFIMLPW I AGVPAVPGQNRLSR I SLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I I I I I I I I I I I I I I ! I I I I hi I II I M I I I : I I I I I II I I I I I I I I Mill 
orf34ng MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 

10 20 30 40 50 60 



20 



60 70 80 90 100 110 

orf 34 - 1 . pep FSGVS FRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRF FLGAAGDGSP 

llllllll Nl NNINNNINN : hi I I I I I I I I II I I I I I I 

or f 3 4 ng FSGVS FRGSGWGAFVGSTGVS LS VFS ACVPVPVNES AARAASEGRGLTRFFLGAAGDGS P 

70 80 90 100 110 120 



25 



120 130 140 150 160 170 

orf 34 - 1 . pep LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 

1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 = MINIM 

orf34ng LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 

130 140 150 160 170 180 



30 



180 190 200 210 220 230 

orf 34 - 1 . pep LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

1 1 1 1 MINIMI llllllll MMMM MMMINMMMMMMMMI 

orf34ng LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 

190 200 210 220 230 240 
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240 250 260 270 280 290 

or f 3 4 - 1 . pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 

1 1 II N I N I II I I II II 1 1 1 ! 1 1 1 1 1 1 N N 1 1 1 Nil I II II 1 1 1 IN I II 1 1 IN 

orf34ng LDWLVEGND FLYADGGADFLGNLRLFFGGEDAHNVGY I AVGNDFDARLC SGADAQQRGA 

250 260 270 280 290 300 
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300 310 320 330 340 350 

orf 34 - 1 . pep DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

III MINIMI INNNNIINNM lllllllllllllll llllllll II 

orf34ng DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

310 320 330 340 350 360 



45 



360 370 380 390 400 410 

orf 34 - 1 . pep DDGDLGRVAFGL WLAQ I GTGGGFDTQRHNVWGLRAGGS AVDGGFRADGGASD YCADAA 

1 1 1 1 1 1 1 1 1 1 II 1 1 1 IN 1 1 1 II 1 1 1 II 1 1 N 1 1 M I II 1 1 II MM : I INN 

orf34ng DDGDLGRVAFGL WLAQ VGTGGGFDTQRHNW I GLRAGGSAVDDGFCADGGPADDCAEAA 

370 380 390 400 410 420 



50 



420 430 440 450 

orf 34 - 1 . pep AKGKAENGGNQGADGVRFGFHRVLPFLGVSDGIALRHAVX 
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Mllh llillll Mill lllllllllllllllll 

orf 34ng AEGKAEDGGNQGADGVWFGFHRGLP FLGVSDG I ALRHAVX 

430 . 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 26 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 215): 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGATT.CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGJAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGTCAAAGA ACAAATCCAA GCCGAGCTGG 

201 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

251 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

This corresponds to the amino acid sequence (SEQ ID NO: 216; ORF4): 

1 MKTFFJCTLSA AALAL ILAAC G . QKDSAPAA SASAAADNGA AKKEIVFGTT 
51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence (SEQ ID NO: 217): 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 CAAACTCAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 218; ORF4-1): 



1 MKTFFKTLSA AALAL ILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 
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201 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 
251 WLKDVTEAYN SDAFKAYAHK RFEGYKS PAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 



5 Homology with a predicted ORF from N. meningitidis (strain A) 



ORF4 (SEQ ID NO: 216) shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) (SEQ 
ED NO: 220) from strain A of N. meningitidis: 



10 20 30 40 50 59 

MKTFFKTLSAAALALILAA CG - QKDS APAAS AS AAADNGAAKKE I VFGTTVGD FGDMVKE 

! II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lllllllllllllllllll MIMIII llllll 

MKTFFKTLSAAALALILAA CGGQKDSAPAASASAAADNGAAXKEIVFGTTVGDFGDMVKE 
10 20 30 40 50 60 

60 70- 80 90 

QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 

II lllllllllllll Mill lllllllll 
X I QPELE KKGYTVKLVEXTD YVRXNLALAEGELD I NVXQHXX YLDDXKKXHNLD I TXVXQ 
70 80 90 100 110 120 

VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

20 

The complete length ORF4a nucleotide sequence (SEQ ID NO: 219) is: 



orf 4 . pep 

10 

orf 4a 



orf 4 .pep 

15 

orf 4a 



orf 4a 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

25 151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CANATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

301 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 

3 51 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 
30 4 01 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 
501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 
551 NNNNANNNNT NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 
601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 

35 651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 



40 



This is predicted to encode a protein having amino acid sequence (SEQ ID NO: 220): 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AXKEIVFGTT 

51 VGD FGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKSLXXVK XGSTVSAPND 

45 151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

201 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKS PAA WNEGAAK* 



CHIR-0160 (356.001) 



-210- 



PATENT 



A leader peptide is underlined. 

Further analysis of these strain A sequences revealed the complete DNA sequence (SEQ ID NO: 



1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

3 01 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 
351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA AGAAGTCAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 
451 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 
501 CAAACTCAAA GACGGCATCA ATCCGCTGAC CGCATCCAAA GCGGACATTG 
551 CCGAAAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 
601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 
651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

.701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 222; ORF4a-l): 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

2 01 PRSRADVDFA WNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

2 51 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 



ORF4a-l (SEQ ID NO: 222) and ORF4-1 (SEQ ID NO: 218) show 99.7% identity in 287 aa 



221): 



overlap: 



orf 4a-l 



10 20 30 40 50 60 

MKTFFKTLSAAALALI LAACGGQKDSAPAASASAAADNGAAKKE I VFGTT VGDFGDMVKE 



orf4-l 




10 20 30 40 50 60 



orf 4a- 1 



70 80 90 100 110 120 

Q I QPELE KKGYTVKLVEFTD YVR PNLALAEGELD I NVFQHKP YLDD FKKEHNLD I TEVFQ 



orf4-l 




70 80 90 100 110 120 



orf 4a- 1 



130 140 150 160 170 180 

VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 



orf4-l 




130 140 150 160 170 180 



orf4a-l 



190 200 210 220 ■ 230 240 

ADIAENLKNIKI^LEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVNWS 
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1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 4 - 1 . ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

5 orf 4a- 1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAKX 

I lillllllllll IIMIIIIMMilllMIMIIII Mill 

orf 4 - 1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAKX 

250 260 270 280 

Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869) (SEP ID 
10 NO: 1126). 

ORF4 (SEQ ID NO: 216) and this outer membrane protein (SEQ ID NO: 1126) show 33% aa 
identity in 91aa overlap: 

10 20 

lip2 .pasha MNFKKLLGVALVSALALTACKDEKAQAP 

15 || | ::|| || 1=11 =|: I 

ORF4 VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL- - ALILAACGFKKTARPPHPL 

110 120 130 140 150 

30 40 50 60 70 80 

1 ip2 . pasha . - ATTAKTENKAPLKVGVMTGPEAQMTEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKD 
20 : :: | : |: :| ::|:: :: | | | | : | | : | | : | : : | | || s 

ORF4 L P P PTT ARRKKE I VFGTT VGD FGDMVKEQ I Q AE LE KKG YTVKLVE FTD YVRPNLALAEGE 

160 170 180 190 200 210 



25 



90 100 110 120 130 140 

lip2 .pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLVWPIAAYSKKIKNISELKDGATVAIPNNAS 

I 

ORF4 L 

Homology with a predicted ORF from N. gonorrhoeae 

ORF4 (SEQ ID NO: 216) shows 93.6% identity over a 94aa overlap with a predicted ORF 
(ORF4.ng) (SEQ ID NO: 224) from N. gonorrhoeae: 

30 10 20 30 

orf 4nm . pep MKTFFKTLSAAALAL I LAACGXQKDS APAA 

II II II II M:|ll II IN I llllll 
orf4ng RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 

200 210 220 230 240 250 

35 40 50 60 70 80 89 

or f 4nm . pep SASA- AADNGAAKKE I VFGTTVGD FGDMVKEQ I QAELE KKG YTVKLVE FTD YVRPNLALA 

I hi Mill III I lillllllllll I Mlllll Mil I I M II lillllllllll 
orf4ng SAAAPSADNGAAKKE I VFGTTVGD FGDMVKEQ I QAELE KKG YTVKLVE FTD YVRPNLALA 

260 270 280 290 300 310 
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90 

orf4nm.pep EGEL 

I I I I 

orf4ng EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 

320 330 340 350 360 370 

The complete length ORF4ng nucleotide sequence (SEQ ID NO: 223) was predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 224): 



1 MKTFFKTLST ASLAL ILAAC GGQKDSAPAA S AAAPS ADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

2 01 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence (SEQ ID NO: 225) to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

4 01 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

4 51 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 

601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

651 CGCCATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 

701 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 

751 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 

801 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 

851 AAGGCGCAGC CAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 226; ORF4ng-l ): 



1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

■ 101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1 (SEQ ID NO: 218): 



10 20 30 40 50 59 

or f 4 - 1 . pep MKTFFKTLS AAALAL I LAACGGQKDSAPAAS AS A - AADNGAAKKE I VFGTTVGDFGDMVK 

II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 II 1 1 1 M H 1 1 1 1 1 1 1 1 1 M .1 1 1 1 1 1 1 U I 

or f 4 ng - 1 MKTFFKTLS AAALAL I LAACGGQKDSAPAAS AAAPS ADNGAAKKE I VFGTTVGDFGDMVK 

10 20 30 40 50 60 

60 70 80 90 100 110 119 
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orf 4 - 1 . pep EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVF 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml , 1 1 1 II II 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h I 

orf4ng-l EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 4 - 1 . pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 

MINIMUM jllllllMII MINI MINN MIIMIIMM MINIMI 

orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

130 140 150 160 170 180 



10 



180 190 200 210 220 230 239 

orf 4 - 1 . pep KADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNW 

I I I I I I I I I I I I I II I I I I II I I I I I I I I M I I I I I I I I I I I I I I , M I I I I I I I I I I I I 
orf4ng-l KADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 

190 200 210 220 230 240 



15 



240 250 260 270 280 

orf 4 - 1 . pep SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

lllllllllllllllllllll INI lllllll Mlllll II 

orf 4ng- 1 SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 



20 In addition, orf4ng-l (SEQ ID NO: 226) shows significant homology with an outer membrane 
protein (SEQ ID NO: 1 126) from the database: 



25 



30 



35 



ID 
AC 
DT 
DT 
DT 
DE 



LIP2_PASHA 
Q08869; 
01-NOV-1995 
01-NOV-1995 
01-NOV-1995 



STANDARD; 



PRT; 



276 AA. 



(REL . 32, CREATED) 
(REL. 32, LAST SEQUENCE UPDATE) 
(REL . 32, LAST ANNOTATION UPDATE) 
28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 
SCORES Initl: 279 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

or f 4ng- 1 . pep MKTFFKTLSAAAL- - ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 

II I ::|| II Ml =1 HII-I :: = | II h =1 -I 

lip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

10 20 30 40 50 

60 70 80 90 100 110 

orf 4ng- 1 . pep VKEQ I QAELEKKG YTVKL VE FTD YVRPNLALAEGELD I NVFQHfCP YLDD FKKEHNLD I TE 

:: - II I I = I I : I I : I : : I I II • I I 1 = 11 III- h = = = = 
lip2_pasha TEVAVKIAKEKYGLDVELVQFTEYTQPNAALHSKDLDANAFQTVPYLEQEVKDRGYKLAI 

60 70 80 90 100 110 



40 



120 130 140 150 160 170 

orf 4ng-l.pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

:: : h= I hi- hllhlh Ih II llll-h I =1111 I = 
lip2_pasha IGNTLVWPIAAYSKKIKNISELKDGATVAIPNNASNTARALLLLQAHGLLKLKDPKN- VF 

120 130 140 150 160 170 



45 



180 190 200 210 220 230 

orf 4ng-l .pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTE- -ALFQEPSFA 

|:: II II I I lllh = = = I I I I = = I I = I = = I I : : I : : : = : : 
lip2_pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 
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180 



190 



200 



210 



220 



230 



orf 4ng-l .pep 



240 250 260 270 280 289 

YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 



5 



lip2_pasha 




240 250 260 270 



Based on this analysis, including the homology with the outer membrane protein of Pasteurella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
10 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (SEQ ID NO: 218) (30kDa) was cloned in pET and pGex vectors and expressed in Exoli, 
as described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figures 8A and 8B show, repsectively, the results of affinity purification of the His-fusion 
15 and GST-fusion proteins. Purified His-fusion protein was used to immunise mice, whose sera were 
used for ELISA (positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a 
bactericidal assay (Figure 8E). These experiments confirm that ORF4-1 (SEQ ID NO: 218) is a 
surface-exposed protein, and that it is a useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1 (SEQ ID 
20 NO: 218). . 

Example 27 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 227): 



25 



30 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



1 



CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 
CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 
GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 
CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 
ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 
GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 
CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 
AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 
AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 
CCTCGAGGAG AACATGGCGA AAATGCGCCA AATCAACGCA CGCATGGTCA 
AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 



CHIR-0160 (356.001) PATENT 

-215- 

'551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 
601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACTCAACG 
651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC . . . . 

701 GC AGACACGCCC GCCGCATCCG 

5 751 CATCGACACC GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

801 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 
851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 
901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

10 This corresponds to the amino acid sequence (SEQ ID NO: 228; ORF8): 

1 PRRP RHAPVSRGDL .LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRTPRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

15 201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

3 01 PPQMAGCPRT PTPAPKPA* 

Computer analysis of this amino acid sequence gave the following results: 
20 Sequence motifs 

ORF8 (SEQ ID NO: 228) is proline-rich and has a distribution of proline residues consistent with a 
surface localization. Furthermore the presence of an RGD motif may indicate a possible role in 
bacterial adhesion events. 

Homology with a predicted ORF from N. gonorrhoeae 

25 ORF8 (SEQ ID NO: 228) shows 86.5% identity over a 312aa overlap with a predicted ORF 
(ORF8.ng) (SEQ ID NO: 230) from N. gonorrhoeae: 



1 1 1 1 1 1 1 E I 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I M I 

PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAEPALFPR 

IPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 

lllll I I I I I I I I I I I I I I I I I I I I I I h I I II I I I I I I I I I 
(PPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 

IARDERPHRRRHRHCRRQTAAAE I HTDVAFHACRQPGRLQQNDCRNQQRQ 

II I I II II III I I II II III II II II Ml II II I MINIM III 
[ARHERPHRRGHRHRRRQTAAAEIHTDVAFHACRQPGRMQQNDCRNQQRQ 

lYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 

: I II hi : 1 1 1 1 1 M I ■ 1 1 1 1 1 INN Ml II I M I II M 





orf 8ng 


1 




orf 8 .pep 


1 


30 


orf 8ng 


51 




orf 8 .pep 


45 




orf 8ng 


101 


35 


orf 8 .pep 


95 




orf 8ng 


151 




orf 8 . pep 


145 
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orf 8ng 


201 


orf 8 .pep 


-195 


orf 8ng 


251 


orf 8 .pep 


245 


orf 8ng 


301 


orf 8 .pep 


295 



I I ! 1 1 1 1 1 1 ! 1 1 1 1 1 1 MINIM I 

XNRQHHRAAPDHRRQAAISQTQRQRNPAAXPPLHTAPN Q 244 

TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 300 

IMIIIIIIIIIIIIIIIIIIIIIIIII M 1 1 1 • I M 1 1 1 1 1 M I III 

TRPPHPHRHRHQPRTGSPRRTPPLPMAGLPLAQHRYASGNFRPRHPAATH 294 

PPQMAGCPRTPTPAPKPA* 319 

I M I I I I I I I I I I I I I I I I 
PPQMAGCPRTPTPAPKPA* 313 

The complete length ORF8ng nucleotide sequence (SEQ ID NO: 229) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 230): 

1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 , QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AYEiARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 

201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 

301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 28 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 231): 

1 . . GAAATCAGCC TGCGGTCCGA CNACAGGCCG GTTTCCGTGN CGAAGCGGCG 

51 GGATTCGGAA CGTTTTCTGC TGTTGGACGG CGGCAACAGC CGGCTCAAGT 

101 GGGCGTGGGT GGAAAACGGC ACGTTCGCAA CCGTCGGTAG CGCGCCGTAC 

151 CGCGATTTGT CGCCTTTGGG CGCGGAGTGG GCGGAAAAGG CGGATGGAAA 

201 TGTCCGCATC GTCGGTTGCG CTGTGTGCGG AGAATTCAAA AAGGCACAAG 

251 TGCAGGAACA GCTCGCCCGA AAAATCGAGT GGCTGCCGTC TTCCGCACAG 

301 GCTTT . GGCA TACGCAACCA CTACCGCCAC CCCGAAGAAC ACGGTTCCGA 

351 CCGCTGGTTC AACGCCTTGG GCAGCCGCCG CTTCAGCCGC AACGCCTGCG 

4 01 TCGTCGTCAG TTGCGGCACG GCGGTAACGG TTGACGCGCT CACCGATGAC 

451 GGACATTATC TCGGAGA . GG AACCATCATG CCCGGTTTCC ACCTGATGAA 

501 AGAATCGCTC GCCGTCCGAA CCGCCAACCT CAACCGGCAC GCCGGTAAGC 

551 GTTATCCTTT CCCGACCGG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 232; ORF61): 



1 . .EISLRSDXRP VSVXKRRDSE 

51 RDLSPLGAEW AEKADGNVRI 

101 AXGIRNHYRH PEEHGSDRWF 

151 GHYLGXGTIM PGFHLMKESL 



RFLLLDGGNS RLKWAWVENG TFATVGSAPY 
VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 
NALGSRRFSR NACVWSCGT AVTVDALTDD 
AVRTANLNRH AGKRYPFPT. . 
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Further work revealed the complete nucleotide sequence (SEQ ID NO: 233): 

1 ATGACGGTTT TGAAGCTTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGTC GGCGCGCCTT 

501 GTCGCGTTTA GGTTTGGATG TGCAGATTAA GTGGCCCAAT GATTTGGTTG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTTG TCCTGCCCAA 

651 GGAAGTAGAA AATGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTGCTGC TGGAAACGCT GTTGGTGGAA 

751 CTGGACGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTT TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCTGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 

1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 234; ORF61-1): 



1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

4 51 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVI ITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1 (SEQ ID 
NO: 234). Further computer analysis of this amino acid sequence gave the following results: 



orf61 


23 


baf 


3 


orf 61 


78 


baf 


63 


orf61 


132 


baf 


123 
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Homologv with the baf protein of B. pertussis (accession number U12020) (SEP ID NO: 1 127). 

ORF61 (SEQ ID NO: 232) and baf protein (SEQ ID NO: 1127) show 33% aa identity in 166aa 
overlap: 



+L+D GNSRLK W + + . A AP DL LG A R +G V G 

ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

EFKKAQVQEQLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 13 1 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

ba f 63 LARGEAI AATLRAGGCI 

10 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
jLVAS FGTATTLDT I GPDNVFPG - GL I LPGPAMMRGALAYGTAHL 

Homology with a predicted ORF from N.meningitidis (strain A) 

ORF61 (SEQ ID NO: 232) shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) 
15 (SEQ ID NO: 236) from strain A of N. meningitidis: 

10 20 30 

orf 61 . pep EISLRSDXRPVSVX KURDS ERFLLLDGGNS 

IIIIIM Mill MM IIIMMIII 

orf 61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 
20 290 300 310 320 330 340 

40 50 60 70 80 90 

orf 61 .pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 

MMMMMMMMMMMMMMMMhMMMMIIMMMIIMMMM 

orf 6 la RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLAR 
25 350 360 370 380 390 400 

100 110 120 130 140 150 

orf 61. pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGTAVTVDALTD D 

IIIIIIIIIM IMIIIIIIIIIIIIIIMIIIIMIIIIIIIIIIIMIIIIIIIMI 

orf 61a KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACWVSCGTAVTVDALT DD 
30 410 420 430 440 450 460 

160 170 180 189 

orf 6 1 . pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

Mill Ml MIMMMIM IMMMMMI M 

orf 61a GHYLG - GT I MPGFHLMKES LAVRTANLNRHAGKRYP FPTTTGNAVASGMMDAVCGS VMMM 

35 470 480 490 500 510 520 

orf 61a HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
530 540 550 560 570 580 

The complete length ORF6 la nucleotide sequence (SEQ ID NO: 235) is: 



40 



1 ATGACGGTTT TGAAGCCTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 
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51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA- TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

5 -251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGTG TGACCCACCT 

3 51 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 
451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGCC GGCGCGCCTT 

10 501 GTCGCGTTTG GGTTTGAAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGAAA TGCCGATGCC GCCGTGTTGC TGGAAACGCT GTTGGCGGAA 

15 751 CTTGATGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTC TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

20 1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGTGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATT CAAAAAGGCA 

12 01 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 
25 12 51 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

13 01 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 
1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 
14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

30 1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATTCACGG GCTGCTGAAC CTGATTGCCG 

35 1751 CCGAAGGCGG GGAATCGGAA CATACTTAA. 

This encodes a protein having amino acid sequence (SEQ ID NO: 236): 



1 MTVLKPSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 
51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

40 101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLWGRDKLG GILIETVRTG 
201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 
251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 
301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

45 351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGEFKKA 

4 01 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 
451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 
501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 
551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

50 

ORF61a (SEQ ID NO: 236) and ORF61-1 (SEQ ID NO: 234) show 98.5% identity in 591 aa 
overlap: 

10 20 30 40 50 60 t 

orf 61a . pep MTVLKPSHWRVIAELADGLPQHVSQ^RMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 

55 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I II 

orf 6 1 - 1 MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
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10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 61a . pep LTOPLAVFDAEGLRELGERSGFQTALKHECASSNDE ILELARI APDKAHKT I CVTHLQSK 

1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M I 1 1 1 II 1 1 1 M 1 1 M M i M 1 1 

or f 6 1 - 1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDE I LELARI APDKAHKT I CVTHLQSK 

70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf 61a . pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQIKWPN 
I I I I I I I I I I II I I I I I I I I I I I I Ml I I I I I I II I I ' I I I I I II M I I :|||||| 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

130 140 150 160 170 180 



15 



190 200 210 220 230 240 

orf 6 la . pep DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

I I II I I M I I I I I I I I I I I I I I I ' I I I I I I I I I I I I ■ I I I I I I I I I M I I I I I I I I I 
orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 



20 



250 260 270 . 280 290 300 

or f 6 la . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I I I I I I I :| I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
or f 6 1 - 1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 



25 



310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

I I I I I I II I II h I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 



30 



370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQIARKIEWLPSSAQAL 

I I I I I I I I I I I I I I I I I I I I H M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
orf 61 - 1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
370 380 390 400 410 420 



35 



430 440. 450 460 470 480 

orf 61a . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

I II I II I I I I I I I I I I I I I I I I I I I I i I II I I I I I I I I I I M M I I I I I I I I I I I I I 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
430 440 450 460 470 480 



40 



490 500 510 520 530 540 

or f 6 la . pep HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 

1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 6 1 - 1 HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGl^DAVCGSVMlviMHGRLKEKTGAGKP 
490 500 510 520 530 540 



45 



550 560 570 580 590 

orf 61a. pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

i II I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I M I I M I I I | || 
orf 61-1 VDVI I TGGGAAKVAEAL P PAFLAENTVRVADNLV I YGLLNM I AAEGRE YEH I X 

550 560 570 580 590 



Homology with a predicted ORF from N.gonorrhoeae 
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ORF61 (SEQ ID NO: 232) shows 94.2% identity over a 189aa overlap with a predicted ORF 
(ORF61 .ng) (SEQ ID NO: 238) from N. gonorrhoeae: 

orf61.pep EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 

Mill I I III II IIIIIIMM 
orf 61ng TVCEGTVKGVDGRGVLHLETAEGEQTWSGEISLRPDNRSVSVPKRPDSERFLLLEGGNS 211 

orf 6 1 . pep RLKWAWVENGTFATVGSAPYRDLS PLGAEWAEKADGNVRI VGCAVCGEFKKAQVQEQLAR 9 0 

1 1 1 1 1 1 1 1 ! 1 1 1 1 1 k 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 ] 1 1 1 1 1 E 1 IIMhIIIII 

orf61ng RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLAR 271 

orf 61 .pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 

lllllllllll IIIIMIIIIIIIIIIIIIIIIIIIIIIMIIIIIMIIIIIIIIIII 

orf 61ng KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

orf 61 . pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 189 

Mill lllllllllllllllllllllll lllllllll 
orf 61ng GHYLG - GT I MPGFHLMKES LAVRTANLNRPAGKRYP FPTTTGNAVASGMMDAVCGS IMMM 3 90 

An ORF61ng nucleotide sequence (SEQ ID NO: 237) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 238): 



1 MFSFGWAFDR PQYEL GSLSP VAALAC RRAL GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAW GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAVLLETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTWS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACVWSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGSIMMM HGRLKEKNGA 

401 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 



Further analysis revealed the complete gonococcal DNA sequence (SEQ ID NO: 239) to be: 



1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

401 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

501 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 

601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 

651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

701 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 

751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

801 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 

951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 



1001 


ggccggatTC 


1051 


AAGTGGGCGT 


1101 


gtaCCGCGAT 


1151 


GAAATGTCCG 


1201 


CAAGTGAAGG 


1251 


ACAGGCTTTG 


1301 


CCGACCGTTG 


1351 


TGCGTCGTCG 


1401 


TGACGGACAT 


1451 


AAGAATCGCT 


1501 


CGTTACCCTT 


1551 


GGACGCGGTT 


1601 


AAAACGGCGC 


1651 


GCGAAAGTCG 


1701 


GCGCGTGGCG 


1751 


CCGAAGGCGG 
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GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 
GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 
TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 
CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 
AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 
GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 
GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 
TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 
TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 
10 1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 
TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 
GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 
CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 
15 1701 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 240; ORF61ng-l): 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 

20 51 LLRQHDGYWR LVRPLAVFDA EGLRDLGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWAFDRPQY 

151 ELGSLSPVAA LACRRALGCL GLETQIKWPN DLWGRDKLG GILIETVRAG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TAS RRGNAD A AVLLETLLAE 

251 LGAVLEQYAE EGFAPFLNEY ETANRDHGKA VLLLRDGETV CEGTVKGVDG 

25 301 RGVLHLETAE GEQTWSGEI SLRPDNRSVS VPKRPDSERF LLLEGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGESKKA 

401 QVKEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

4 51 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRPAGK 

501 RYPFPTTTGN AVASGMMDAV CGSIMMMHGR LKEKNGAGKP VDVIITGGGA 

30 551 AKVAEALPPA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HA* 

ORF61ng-l (SEQ ID NO: 240) and ORF61-1 (SEQ ID NO: 234) show 93.9% identity in 591 aa 
overlap: 

orf 61ng- 1 . pep MTVLKPSHWRVLAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

35 1 1 1 II 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I II 

orf 61 - 1 MTVLKLSHWRVLAEI^GLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

orf 61ng-l .pep LVRPLAVFDAEGLRDLGERSGFQTALKHECAS SNDE I LELARI APDKAHKT I CVTHLQS K 120 

' I ! I I I I I I M I I I : I I I Ml IN I I 1 1 I I I I II I I I I M I I I I I I I I I I I I I I i I I 
orf61-l LWPLAVFDAEGLRELGERSGFQTALKHE CAS SNDE I LELARI APDKAHKT I CVTHLQS K 120 

40 orf 61ng-l .pep GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 180 

I I I I I I I I I I I I I Ml I I I I, h I ■ I I I I I I I I I I I Mh I M I I : IMMIMM 
orf 61 - 1 GRGRQGRKWS HRLGECLMFSFGWVFDRPQ YE LGSLSPVAAVACRRALSRLGLDVQ I KWPN 180 

orf 61ng-l .pep DLWGRDKLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 240 

MMIIIIMIIMMIMII I MIMMMIMIMMMMMMIMIMIMM 

45 orf 61- 1 DLWGRDKLGG I L I ETVRTGGKTVAWGI G I NFVLPKEVENAAS VQS LFQTASRRGNADA 24 0 

orf 61ng-l .pep AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 

Illllllhll III Ml- IMM I MM M 1 1 1 1 1 II 1 1 M I M I lllllllll 

or f 6 1 - 1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 



50 



orf 61ng-l .pep 



RGVLHLETAEGEQTWSGE I SLRPDNRSVS VP KRPDSERFLLLEGGNSRLKWAWVENGTF 

M 1 1 1 1 M I M M I II I Ml M I hi Mill 1 1 1 1 1 II h 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 



360 
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orf 61-1 

orf 61ng-l . pep 

orf 61-1 

orf 61ng-l .pep 

orf 61-1 

orf 61ng-l .pep 

orf 61-1 

orf 61ng-l .pep 

orf61-l 



QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 360 
ATVGSAPYRDLS PLGAEWAEKADGNVRI VGCAVCGESKKAQVKEQLARKI EWLPSSAQAL 420 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II h 1 1 1 1 1 1 1 1 1 1 1 II 1 1 I 

ATVGSAPYRDLSPLGAEWAEKADGNVR I VGCAVCGEFKKAQVQEQLARKI EWLPSSAQAL 420 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 480 

I I II I I II I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I II 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 4 80 

HLMKESLAWTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 54 0 

M I I I I M I Ml I I I I I I M II I II II I I Ml I I II I I I I h I I I I I I I I I M I I I 
HLMKESI^VRTANLNRHAGKRYPFPTTTGNAVASG^DAVCGSVMMMHGRLKEKTGAGKP 54 0 

VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 

Mlillll lllllllllllllllllllll IIMIIMIIII I II 

VDV 1 1 TGGGAAKVAE AL P P AFLAENTVRVADNLV I YGLLNM I AAEGRE YEH I X 5 93 



Based on this analysis, including the homology with the baf protein (SEQ ID NO: 1127) of 
B. pertussis and the presence of a putative prokaryotic membrane lipoprotein lipid attachment site, 
it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 29 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 241): 



l 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTAAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 
ATGCGGCGCG 
CGGaAGAGGG 
GCGGGCGCGG 
ACGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 
GCGGCATTTG 
CGGCGaAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 
CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGCTG 



TGGAGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 
GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGC. . 



This corresponds to the amino acid sequence (SEQ ID NO: 242; ORF62): 



1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 243): 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 
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51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

2 51 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 
301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

3 51 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG • 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 
4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 
501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 
551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 
601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 
651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 
701 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 
751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 
801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 
851 AATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 244; ORF62-1): 



1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLILGEHLS PVSALGVFW IAATLVAGRL SHQK* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical transmembrane protein HI0976 of H. influenzae (accession number 
057147) (SEP ID NO: 1128) 



ORF62 (SEQ ID NO: 242) and HI0976 (SEQ ID NO: 1128) show 50% aa identity in 114aa 
overlap: 



Orf62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI0976 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLI IAMIIVMPLFLRRWKKIDKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA + +GLEPLL+ VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQF I GLKYTS AS S AVTM I GLEPLLWFVGHFFFKTKQNGF 114 

Homology with a predicted ORF from N.meninsitidis (strain A) 



ORF62 (SEQ ID NO: 242) shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) 
(SEQ ID NO: 246) from strain A of N. meningitidis: 
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10 20 30 40 50 60 

orf 62 . pep MFYQ I LAL IIWSSSFI AA KYVYGG I D PALMVGVRLL IAAL PAL PACRRHVGKI PREEWKP 

I I I I I I I I I I II I I I I I I I I I I I I I hi I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 62a MFYQ I LAL 1 1 WS S S F I AA KYVYGG I D PALMVGVRLL IAAL PAL PACRRHVGKI PREEWKP 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 62 . pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M; 1 1 1 1 I M 1 1 1 1 1 1 1 1 1 1 1 in I 

orf 62a LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 62 . pep AAFAGVALLMAGG AEEGGEVGW FGCLLVLLAGAGFCAAM RPTQRL I ARIGAPAFTS VS I A 

IIIIIIIIIIIIMIII IIMIIIIIIIillll llllillMMIMMIIMI li 

orf 62a AAFAGVALLMAGGA EEGGEVGW FGCLLVLLAGAGFCAAM RPTQRL I ARIGAPAFTS VS I A 

130 140 150 160 170 180 



190 200 210 

orf 62 . pep AASLMCLPFSLALA QSYTVDWSVGMVLSLLYLGLGC 

M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 M M . 

orf 62a AASLMCLPFSLALA QSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 



or f 6 2 a S LEP WGVLLAVL I LGEHLS PVS VLGVFWI AATLVAGRLSHQKX 

250 260 270 ^ 280 

The complete length ORF62a nucleotide sequence (SEQ ID NO: 245) is: 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGGCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCTGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

2 01 CAACTATGTG CTGACCCTGC TACTTCAGTT TGTCGGGTTG AAATACACTT 

2 51 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCACT GCTGATGGTG 
301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

3 51 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 
4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 
501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 
551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 
601 TGGAGCGTCG GAATGGTATT GTCGCTGCTG TATTTGGGCG TGGGGTGCAG 
651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 
701 ACGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 
751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG TCTTGGGCGT 
801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 
851 AATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 246): 



1 MFYQ I LAL I I WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKI PREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GWFGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVL I LGEHLS PVSVLGVFW IAATLVAGRL SHQK+ 
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ORF62a (SEQ ID NO: 246) and ORF62-1 (SEQ ID NO: 244) show 98.9% identity in 284 aa 
overlap: 



10 



15 



orf 62a .pep 
orf 62-1 
orf 62a . pep 
orf 62-1 
orf 62a .pep 
orf 62-1 
orf 62a .pep 
orf62-l 
orf 62a .pep 
orf 62-1 



MFYQILALI I WSSSFI AAKYVYGG ID PALMVGVRLLI AALP ALPACRRHVGKI PREEWKP 

1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 i 1 9 1 ! 1 1 1 1 1 1 

MF YQ I LAL IIWSSS F I AAKYVYGG I DPALMVGVRLL I AALP ALP ACRRHVGK I PREEWKP 



AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I I I M I I I I 
AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 



60 



60 



LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

II I M II II II Ml 1 1 Mill MUM II M I M M IMIMMII I! II II M MM 

LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 12 0 



180 



180 



240 



AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 

II II 1 1 1 I MM I II II M II II II MM I MM MMI M 1 1 M I II MM M 1 1 II 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 24 0 



SLEPWGVLLAVLILGEHLS PVSVLGVFWIAATLVAGRLSHQKX 

I M I II 1 1 M I II 1 1 M I M MM II 1 1 1 1 1 1 1 1 II M II 1 1 1 

SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 



285 



285 



Homology with a predicted ORF from ^gonorrhoeae 

ORF62 (SEQ ID NO: 242) shows 99.5% identity over a 216aa overlap with a predicted ORF 
20 (ORF62.ng) (SEQ ID NO: 248) from N. gonorrhoeae: 

orf 62 . pep MFYQILALI I WSSSFI AAKYVYGG ID PALMVGVRLLI AALP ALP ACRRHVGK I PREEWKP 60 

Illllllll MMMM MIIIMI IIIIMM MMMMMMMMIMM 

orf62ng MFYQI LAL I IWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

orf 62 .pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

25 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 62ng LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf 62 . pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

IMIIIIMI MMMMMMIMM MMMMMIMM IMIIIIIMIIIM 

orf 62ng AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

30 orf 62. pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGC 216 

III II III II Mill Mill II Mill III 1 1 

orf 62ng AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 24 0 

The complete length ORF62ng nucleotide sequence (SEQ ID NO: 247) is: 



35 



40 



i 

51 
101 
151 
201 
251 
301 



ATGTTTTACC 
CGCCAAATAT 
GCCTGCTGAT 
GGCAAGATTC 
CAACTATGTG 
CCGCCGCCAG 
TTTGTCGGAC 



AAATCCTTGC 
GTCTATGGCG 
TGCCGCGCTG 
CGCGTGAGGA 
CTGACCCTGC 
CGCATCGGTC 
ACTTTTTCTT 



CCTGATTATC 
GCATCGATCC 
CCTGCACTGC 
ATGGAAGCCG 
TGCTTCAGTT 
ATTGTCGGAC 
CAACGACAAA 



TGGGGCAGCT 
CGCATTGATG 
CCGCCTGCCG 
TTGCTGATTG 
TGTCGGGTTG 
TCGAGCCGCT 
GCGCGTGCCT 



CGTTTATTGC 
GTCGGCGTGC 
CCGTCATGTC 
TGTCGTTCGT 
AAATACACTT 
GCTGATGGTG 
ACCACTGGAT 
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10 



351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGCGGCGCG 
CGGAAGAGGG 
GCGGGCGCGG 
CCGCATCGGC 
TGATGTGCCT 
TGGAGCGTCG 
CTGGTACGCC 
ACGCGTCGGG 
GCGGTTTTGA 
GTTTGTCGTC 
ACGCGCAAAA 



GCGGCATTTG 
CGGCGAAGTC 
GCTTTTGTGC 
GCACCGGCAT 
GCCGTTTTCG 
GGATGGTATT 
TATTGGCTGT 
ACTGTTGATT 
TTTTGGGCGA 
ATCGCCGCCA 
CGGCAATGCC 



CCGGTGTCGC 
GGCTGGTTCG 
CGCTATGCGT 
TCACATCTGT 
CTTGCTTTGG 
GTCGCTGTTG 
GGAACAAGGG 
TCGCTCGAAC 
ACATTTATCG 
CTTTCGCCGC 
GTCTGA 



GCTGCTGATG 
GCTGCCTGCT 
CCGACGCAAA 
TTCCATTGCC 
CGCAAAGTTA 
TATTTGGGTT 
GATGAGCCGT 
CCGTCGTCGG 
CCCGTGTCCG 
CGGCCGGCTG 



GCGGGCGGTG 
GGTGTTGTTG 
GGCTGATTGC 
GCCGCATCGT 
TACCGTGGAC 
TGGGGTGCGG 
GTTCCTGCCA 
CGTGCTGTTG 
CCTTGGGCGT 
TCGCGCAGGG 



This encodes a protein having amino acid sequence (SEQ ID NO: 248): 



15 



20 



i 

51 
101 
151 



MFYOILALIP WGSSFIAAKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 



GKIPREEWKP 
FVGHFFFNDK 
AGAGFCAAMR 



LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 
ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 



201 WSVGMVLSLL 
251 AVLILGEHLS 



PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 
YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 
PVSALGVFW IAATFAAGRL SRRDAQNGNA V* 



ORF62ng (SEQ ID NO: 248) and ORF62-1 (SEQ ID NO: 244) show 97.9% identity in 283 aa 
overlap: 



25 



10 20 30 40 50 60 

orf62ng.pep MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

II II I II Nihil II II II MM II I II II II II II II I II II 1 1 III I Mil-Ill II I 

orf 62 - 1 MFYQILALI IWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 62ng . pep LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 

1 1 1 1 1 M Ml 1 1 M 1 1 1 1 ! M 1 1 1 1 ! I M 1 1 II M M M M I II I M I II 1 1 1 i I M 

orf 62-1 LLIVS FVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFND KARA YHWICGA 

70 80 90 100 110 120 



35 



40 



130 140 150 160 170 180 

orf 62ng . pep AAFAGVALL14AGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II M II 1 1 1 M M II II 1 1 1 

orf 62 - 1 ■ AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 62ng . pep AASLMCLPFSLALAQSYTVDWSVGMVLSLL YLGLGCGWYA YWLWNKGMSRVP ANAS GLL I 

M 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 1 M II 1 1 1 1 1 1 1 h II 1 1 1 

orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 



250 260 270 280 290 

orf 62ng . pep SLEPWGVLLAVLILGEHLSPVSALGVFWIAATFAAGRLSRRDAQNGNAVX 
45 | | | || || | || | | | | | | | || | || | | | | | || I I II h : I I I I h : 

orf 62 - 1 S LE P WGVLLAVL I LGEHLS PVSALGVFW I AATLVAGRLSHQKX 

250 260 270 280 

Furthermore, ORF62ng (SEQ ID NO: 248) shows significant homology to a hypothetical 
50 Kinfluenzae protein (SEQ ID NO: 1 128): 
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sp|Q57147|Y976_HAEIN HYPOTHETICAL PROTEIN HI0976 ) gi | 1074589 | pir | | B64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
)gi | 1574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 

Score = 106 bits (262) , Expect = 2e-22 

Identities = 56/114 (49%), Positives = 68/114 (59%) 

Query: 1 MFYQ I LALI I WGS S F I AAKYVYGG IDPALMVGVRXXXXXXXXXXXCRRHVGKI PREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct : 1 MLYQILALLIWSSSLI VGKLTYSMMDPVLWQVRLI IAMI IVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L + + F NY LLQF+GLKYTSA+SA + +GLEPLL+ VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 

Based on this analysis, including the homology with the transmembrane protein (SEQ ED NO: 
1 128) of {[.influenzae and the putative leader sequecne and several transmembrane domains in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 30 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 249): 



1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

2 01 CGGTTCGgtA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 

2 51 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 
301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

3 51 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

4 01 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 
451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 
501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC , 
551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 
601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAGCATA GGCGGCGTAT 
651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 
701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 
751 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 
801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 
851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 
901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 
951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 250; ORF64): 



1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 
51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 
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101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 

301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 251): 



1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

401 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

4 51 GGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG. CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

14 01 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

14 51 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

1751 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 

1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

2 001 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 

2 051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG TCAGAATCAT CTTGCCAAAA 

2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence (SEQ ID NO: 252; ORF64-1); 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALNLAAD NALGNAVPVQ IDLIGAASLP 

151 GDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 
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201 QRAGSVRDLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLAT LLIAS LLSIFLALVM AL YFARRFVE 

3 01 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

3 51 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKTFNKA AEQILGMPLT 

5 4 01 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVHVK YAAPDDAKIL 

451 LGKATVLPED NGNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEQ DAQILTRSTD TIVKQVAALK EMVEAFRNYA 

551 RSPSLKLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLT VAADTTAMRQ 

. 601 VLHNIFKNAA EAAEEADVPE VRVKSETGQD GRIVLTVCDN GKGFGREMLH 

10 651 NAFEPYVTDK PAGTGLGLPV VKKIIEEHGG RISLSNQDAG GACVRIILPK 

701 TVKTYA* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.meningitidis (strain A) 

15 ORF64 (SEQ ID NO: 250) shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) 
(SEQ ID NO: 254) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf 64 . pep MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

llllllllllll I IMIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIMIIIII 

20 orf 64a MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 64 . pep DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFLFG FPAQFINGTINSWFGNDTHEALERSLN 

Mill II II MINI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 1 1 1 1 1 1 1 1 1 M I 

25 orf 64a DRRDGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQFINGTINSWFGNDTHEALERSLN 

70 80 90 100 110 



130 140 150 160 170 180 

orf 64 . pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 

III MIIIIIIMIMIIM lllllll 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 Mill 

30 orf 64a LS KS ALNLAADNALGNAI P VQ I DX IGAASLPXDMGRVLEHYAGSGFAQLAL YNAASGKI E 

120 130 140 150 160 170 



190 200 210 220 230 240 

orf 64 . pep KS INPHKLDQPFPGKARWEKIQRAGSVRDLES IGGVLYAQGWLSAGTHXGRDYALFFRQP 

1 1 1 M II 1 1 II 1 1 1 1 1 M 1 1 M 1 1 1 1 1 lllllllll Mill II 1 1 1 1 1 1 1 1 1 1 1 

35 orf 64a KSINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQP 

180 190 200 210 220 230 



250 260 270 280 290 300 

orf 64 . pep VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLAT LLIASLLSIFIiALVMALY FARRFV 

Mil IMIMMIMI i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 

40 orf 64a VPKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLAT LLIASLLSIFLALVMALY FARRFV 

240 250 260 270 280 290 



310 320 330 340 350 360 

orf 64 . pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 

1 1 M M 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 MM I Mill 1 1 Ml I Ml MM 

45 orf 64a EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 

300 310 320 330 340 350 
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370 380 390 

orf 64 . pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 

I I I I I I I M I I I I M I II I I I I i I I I I I I I I 

orf 64a ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 390 400 410 

orf 64a LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGWMVIDDITVLIHAQ 
420 430 440 450 460 470 

The complete length ORF64a nucleotide sequence (SEQ ID NO: 253) is: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT' 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

4 01 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

4 51 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGACATTATC TCGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

13 01 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 
1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACNGCAACG GCGTGGTAAT 

14 01 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 
1451 GGGGCGAAGT GGCAAAACGG CTGGCACACG AAATCCGCAA TCCGCTCACG 
1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 
1551 GGACGAGCAN GACGCGCAAA TCCTGACACG TTCGACCGAC ACCATCATCA 
1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG AGGCATTCCG CAATTACNCG 
1651 CGTTCCCCTT CGNCTCAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 
1701 CGATGTGTTG GCATTGTACG AAGCTGGTCC GTGCCGGTTT GCGGCGGAAC 
1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 
1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 
1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAGC GGGGCAGGAC GGACGGATTG 
1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT TCGGCAGGGA AATGCTGCAC 
1951 AATGCCTTCG AGCCGTATGT AACGGACAAA CCGGCTGGAA CGGGATTGNG 
2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA • ACACGGCGGC CNCATCAGCC 
2 051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG TCAGAATCAT CTTGCCAAAA 
2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence (SEQ ID NO: 254): 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 
51 LARYVILLLK DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 
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10 



15 



101 
151 
201 
251 
.301 
351 
401 
451 
501 
551 
601 
651 
701 



TINSWFGNDT 
XDMGRVLEHY 
QQAGSVRDXE 
IEKARAXXXX 
PVLSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RSPSXQLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVETYA* 



HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAXG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NXNGWMVID 
WKLGGKLDEX 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLXLPV 



SKSALNLAAD 
YNAASGKIEK 
WLSAXTHNGR 
FFLATLLIAS 



NALGNAIPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLSIFLALVM 



RPVLRNDEFG 
LTTGWVFDE 
AEVFAAIGAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSEAGQD 
VKKIIEEHGG 



RLTKLFNHMT 
QGCLKTFNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIIKQVAALK 
AAELAGEPLM 
GRIVLTVCDN 
XISLSNQDAG 



IDXIGAASLP 
FPGKARWEKI 
PKGVAEDAVL 
ALY FARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYX 
MAADTTAMRQ 
GKGFGREMLH 
GAXVRIILPK 



ORF64a (SEQ ID NO: 254) and ORF64-1 (SEQ ID NO: 252) show 96:6% identity in 706 aa 
overlap: 



20 



10 20 30 40 50 60 

orf 64a . pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

M 1 1 1 1 M Ml I [ 1 1! 1 1 1 11 1 1 ! M 1 11 1 M Mill I 

orf 64 - 1 MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSANLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 



25 



70 80 90 100 110 120 

orf 64a . pep DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

1 1 1 1 1 , 1 1 li 1 1 1 1 II II 1 1 1 1 1 1 1 i 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 11 

orf 64 - 1 DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 

70 80 90 100 110 120 



30 



130 140 150 160 170 180 

orf 64a . pep SKSALNLAADNALGNA I PVQ I DX I GAASLPXDMGRVLEHY AGSGFAQLAL YNAASGKIEK 

lllllllllllli MINI lllllll I IMIIMMIIIIMIII llllll I 

orf 64 - 1 SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 



35 



190 200 210 220 230 240 

orf 64a . pep S INPHKLDQPFPGKARWEKIQQAGSVRDXES IGGVLYAXGWLSAXTHNGRDYALFFRQPV 

MM llllllll IMIIMIIIII IIIMIII Mill IIIIIIIIIIIIMI 

orf 64-1 SINPHKLDQP FPGKARWEKI QRAGSVRDLES I GGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 



40 



250 260 270 280 290 300 

orf 64a . pep PKGVAEDAVL IE KARAXXXXLSYSKKGLQTFFLATLL IASLLS I FLALVMALYFARRFVE 

1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 I M I Ml M I M I M M 1 1 M I II 1 1 1 1 M I M II II M 

orf 64 - 1 PKGVAEDAVL I EKARAKYAELSYSKKGLQTFFLATLL IASLLS I FLALVMALYFARRFVE 

250 260 270 280 290 300 



45 



310 320 330 340 350 360 

orf 64a . pep PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

I II II I II MM Mill II II I II II 1 1 Mill lllllllllllli II I INI Mill II 

orf 64 - 1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

310 320 330 340 350 360 



50 



370 380 390 400 410 420 

orf 64a . pep RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

I I M II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 64 - 1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 
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430 440 450 460 470 480 

orf 64a . pep AE VFAA IGAAAGTDKPVHVKYAAPDDAKI LLGKATVLPEDNXNGVVMV IDD I TVL I HAQK 

IIIMIIIMIIIIIIIIMIIIIIIIIIIIIIIIIIIMI 1 1 1 1 M 1 1 1 1 1 1 1 1 II I 

orf 64 - 1 AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 



10 



490 500 510 520 530 540 

orf 64a. pep EAAWGEVAKRLAHE IRNPLTP IQLS AERLAWKLGGKLDEXDAQ I LTRSTDT I I KQVAALK 

I 1 1 M I 1 1 1 1 1 1 1 ! I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I M II i I ! MIIMIMM MIMIII 

orf 64 - 1 EAAWGEVAKRLAHE IRNPLTP IQLSAERLAWKLGGKLDEQDAQ I LTRSTDT I VKQVAALK 

490 500 510 520 530 540 



15 



550 560 570 580 590 600 

orf 64a . pep EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 

MINI | I I I M M II I I I I II I M II I I I I I I II I I II I II :||||||||| 
orf 64 - 1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 



20 



610 620 630 640 650 660 

orf 64a . pep VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I J I I I I I M I I I I I I 

orf 64 - 1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 



25 



670 680 690 700 

or f 64a . pep P AGTGLXL PWKK 1 1 EEHGGX I S LSNQD AGGAX VR I ILPKTVETYAX 

1 1 1 1 1 1 IMIIIMIIIII Ml Mill 1 1 1 1 1 ■ I I I : I I I 1 

or f 64 - 1 PAGTGLGLPWKKI IEEHGGRISLSNQDAGGACVRI ILPKTVKTYAX 

670 680 690 700 



Homology with a predicted ORF from N. gonorrhoeae 



ORF64 (SEQ ID NO: 250) shows 86.6% identity over a 387aa overlap with a predicted ORF 
(ORF64.ng) (SEQ ID NO: 256) from N. gonorrhoeae: 



30 



35 



40 



orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 
orf 64 .pep 
orf 64ng 



MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

MINIMUM I IIIIIMIMIIIIMIII MMIMMIMM IMIIII 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 

DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 

I Ih II II I II llllll I I hi llh I I MM I I II I Ml I II II II I! II 

DRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 



60 



60 



120 



119 



180 



LS KS ALNLAADNALGNAVP VQ I DL I GAAS L PGDMGRVLEH YAGSG FAQLAL YNXASGK I E 

I II I I Ml I I I I '-I I II II I I I I Ml I I Ml I I I I I I I I I I I I I I I llllll 
LSKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLALYNAASGKIE 179 

KS INPHKLDQPFPGKARWEKIQRAGSVRDLES IGGVLYAQGWLSAGTHXGRDYALFFRQP 24 0 

M II I I : : I I I : I I = II = I M : I I II = I I II I II I II II II II I I I MIIMIMM 

KS INPHQFDQPLPDKEHWEQIQQTGSVRSLES I GGVL Y AQGWLS AGTHNGRD YAL F FRQ P 23 9 

VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFV 300 

M-IIMM II M MM I MM II II I Ml IIMIIMIIMI MUM llllll 1 1 

I PENVAQDAVLI EKARAKYAELS YSKKGLQTFFLVTLLI ASLLS I FLALVMALYFARRFV 2 9 9 
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orf64.pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 

I hll IIIIIIII1IIIMIIIIIIIIIMMM I I i I I I I i I I I 1 I : I I I I 

orf64ng EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 359 

orf 64 .pep ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 394 

I M M I I I : I I I I I I M :| I 

orf 64ng ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 400 

An ORF64ng nucleotide sequence (SEQ ID NO: 255) was predicted to encode a protein having 
amino acid sequence (SEQ ED NO: 256): 



10 1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVSFS AM LLLVLSAV 

51 LARYVILLLK DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERS LNL SKSALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

15 251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLS I FLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGVWSYP LSCCRTAVFS TCHSSPLSYF* 

Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 257): 

20 1 ATGCGCCGCT TCCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGCTGTA 

51 CGGATTGACG GCGGCGACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATAGT CTCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCA ACGGCGTGTT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTCACG CTGGTCGCCG 

25 251 TACTGCCCGG CTTGTTCCTG TTCGGCATTT CCGCGCAGTT TATCAACGGC 

3 01 ACGATTAATT CGTGGTTCGG CAACGACACC CACGAAGCCC TCGAACGCAG 

351 CCTTAATTTG AGCAAGTCCG CACTGGATTT GGCGGCAGAC AATGCCGTCA 

401 GCAACGCCGT TCCCGTACAG ATAGACCTCA TCGGCACCGC CTCCCTGTCG 

451 GGCAATATGG GCAGTGTGCT GGAACACTAC GCCGGCAGCG GTTTTGCCCA 

30 501 GCTTGCCCTG TACAATGCCG CAAGCGGGAA AATCGAAAAA AGCATCAATC 

551 CGCACCAATT CGACCAGCCG CTTCCCGACA AAGAACATTG GGAACAGATT 

601 CAGCAGACCG GTTCGGTTCG GAGTTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGA TGGTTGTCGG CAGGTACGCA CAACGGGCGC GATTACGCGC 

701 TGTTCTTCCG CCAGCCGATT CCCGAAAATG TGGCACAGGA TGCCGTTCTG 

35 751 ATTGAAAAGG CGCGGGCGAA ATATGCCGAA TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTTCTGG TAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC GCTGGTAATG GCACTGTATT TTGCCCGCCG TTTCGTCGAA 

901 CCCATTCTGT CGCTTGCCGA GGGCGCAAAG GCGGTGGCGC AGGGTGATTT 

951 CAGCCAGACG CGCCCCGTAT TGCGCAACGA CGAGTTCGGA CGTTTGACCA 

40 1001 AGCTGTTCAA CCATATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAACGCAACC GCCGGCGCGA GGAAGCCGCC CGTCACTACC TCGAGTGCGT 

1101 GTTGGATGGG TTGACTACCG GTGTGGTGGT GTTTGACGAA AAAGGCCGTT 

1151 TGAAAACCTT CAACAAGGCG GCGGAACAGA TTTTGGGGAT GCCGCTCGCC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

45 1251 GTCCCTGCTT GCCGAAGTGT TtgccgccAT CGGTGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCAGGTGGAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

13 51 CTGGGCAAGG CGACGGTATT GCCCGAAGAC AACGGCAACG GCGTGGTGAT 

14 01 GGTGATTGAC GACATCACCG TGCTGATACG CGCGCAAAAA GAAGCCGCGT 
1451 GGGGTGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

50 1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGATCAG GACGCGCAAA TCCTGACGCG TtcgACCGAC ACCATCATCA 

1601 AACAGgtggc gGCGTTAAAA GAAATGGTCG AGGCATTCCG CAATTACGCG 

1651 CGCGCCCCTT CGCTCAAACT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTTTTG GCCCTGTACG AAGCCGGCCC GTGCCGGTTT GAGGCGGAAC 
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1751 TTGCCGGCGA 

1801 GTGCTGCACA 

1851 TATGCCCGAA 

1901 TCCTGACGGT 

1951 AATGCTTTCG 

2001 TCTGCCTGTA 

2051 TGAGCAATCA 

2101 ACGGTAGAAA 



ACCGCTGATG 
ATATTTTCAA 
GTCAGGGTAA 
TTGCGACAAC 
AGCCGTATGT 
GTGAAAAAAA 
GGATGCGGGT 
CTTATGCGTA 



ATGGCGGCGG 
AAATGCCGCC 
AATCGGAAAC 
GGCAAGGGAT 
GACGGATAAG 
TCATTGGAGA 
GGGGCGTGTG 
G 



ATACGACCGC 
GAAGCGGCGG 
GGGGCAGGAC 
TCGGCAAGGA 
CCGGCGGGAA 
ACACGGCGGC 
TCAGAATCAT 



CATGCGGCAG 
AAGAAGCCGA 
GGACGGATTG 
AATGCTGCAC 
CGGGACTGGG 
CGCATCAGCC 
CTTGCCAAAA 



10 This corresponds to the amino acid sequence (SEQ ID NO: 258; ORF64ng-l): 



15 



20 



25 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MRRFLPIAAI 
LARYVILLL K 
TINSWFGNDT 
GNMGSVLEHY 
QQTGSVRSLE 
IEKARAKYAE 
PILSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RAPSLKLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVETYA* 



CAWLLYGLT 
DRRNGVFGSQ 
HEALERS LNL 
AGSGFAQLAL 
SIGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLDG 
HGVSAQQSLL 
NGNGWMVID 
WKLGGKLDDQ 
DLNALIGDVL 
EAAEEADMPE 
PAGTGLGLPV 



AATGSTSSLA 
IAKRLSGMFT 



DYFWWIVSFS 
LVAVLPGLFL 



SKSALDLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLVTLLIAS 



NAVSNAVPVQ 
SINPHQFDQP 
DYALFFRQPI 
LLS I FLALVM 



RPVLRNDEFG 
LTTGVWFDE 
AEVFAAIGAA 
DITVLIRAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKIIGEHGG 



RLTKLFNHMT 
KGRLKTFNKA 
AGTDKPVQVE 
EAAWGEVAKR 
TIIKQVAALK 
EAELAGEPLM 
GRIVLTVCDN 
RISLSNQDAG 



AMLLLVLSAV 
FGI SAQFING 
IDLIGTASLS 
LPDKEHWEQI 
PENVAQDAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLA 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
MAADTTAMRQ 
GKGFGKEMLH 
GACVRIILPK 



ORF64ng-l (SEQ ID NO: 258) and ORF64-1 (SEQ ID NO: 252) show 93.8% identity in 706 aa 
overlap: 



30 



orf 64ng-l .pep 



orf 64-1 



10 20 30 40 50 60 

MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I h I I I I I I I I I I I I I I I I I I I I I I 
MRRFLPIMICAVVLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

10 20 30 40 50 60 



35 



orf 64ng-l .pep 



orf64-l 



70 80 90 100 110 120 

DRRNGVFGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNL 

IIMIIII MIIII'IIIIIMIMI Mill IMIIMMIUIIIII Ml 

DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
70 80 90 100 110 120 



40 



-130 140 150 160 170 180 

orf 64ng- 1 . pep SKSALDLAADNAVSNAVPVQ I DL I GTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 

llll|:||||||::||llllllllhlll HI I I I I I I I I I I I I I II I I I I I I I I II 

orf 64 - 1 SKSALNLAADNALGNAVPVQ I DL I GAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 

130 140 150 160 170 180 



45 



orf 64ng-l . pep 



orf64-l 



190 200 210 220 230 240 

SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 

|||||::|||:| I = I I = I I :: I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I = 

SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 

190 200 210 220 230 240 



50 



orf 64ng-l .pep 



250 260 270 280 290 300 
PENVAQDAVL I EKARAKYAELSYSKKGLQTFFLVTLLI AS LLS I FLALVMALYFARRFVE 
I I 1 = I I I II I I 1 1 I I I M I I I I I I M = I I M I I I M I I I I I I I I 1 I 



CHIR-0160 (356.001) PATENT 

-236- 

orf 64 - 1 PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 64ng- 1 . pep PILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

hi I ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 1 1 1 M i 1 1 1 1 1 Ml 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 

or f 64 - 1 PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLS IAKEADERNRRREEAA 

310 320 330 340 350 360 



370 380 390 400 410 420 

orf 64ng- 1 . pep RHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 

MIMMMIMM IMIMI 1 1 II M 1 1 1 II I M I M M 1 1 1 1 1 II I II 1 1 1 1 1 1 1 

orf 64 - 1 RHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 



430 440 450 460 470 480 

orf 64ng- 1 . pep AEVFAAI GAAAGTDKPVQVEYAAPDDAKI LLGKATVLPEDNGNGWMVI DD I TVL I RAQK 

1 1 1 II 1 1 1 II II I M I MM 1 1 1 1 1 III II M 1 1 1 1 II ! I M 1 1 II I II II II I M 1 1 

orf 64 - 1 AE VFAA I G AAAGTD KP VHVKYAAPDD AK I LLGKAT VLPEDNGNG WMV I DD I TVL I HAQ K 

430 440 450 460 470 480 



490 500 510 520 530 540 

orf64ng-l .pep EAAWGEVAKRLAHE I RNPLTP I QLS AERLAWKLGGKLDDQDAQ I LTRSTDT 1 1 KQVAALK 

1 1 1 1 1 1 1 II II I II 1 1 1 1 1 M I IM M 1 1 M 1 1 1 Ml I III 1 1 1 1 1 1 1 M II I M 

orf 64 - 1 EAAWGEVAKRLAHE I RNPLTP I QLS AERLAWKLGGKLDEQDAQ I LTRSTDT I VKQ VAALK 

490 500 510 520 530 540 



550 560 570 580 590 600 

orf 64ng- 1 . pep EMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGEPLMMAADTTAMRQ 

1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINIM MINIUM 

or f 64 - 1 EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 



610 620 630 640 650 660 

orf 64ng- 1 . pep VLHNIFKNAAEAAEEADMPEVRVKSETGQDGRIVLTVCDNGKGFGKEMLHNAFEPYVTDK 

M 1 1 M M 1 1 1 II 1 1 1 M MM 1 1 II II 1 1 II II 1 1 1 1 M I II I M 1 1 1 II II I II 1 1 

orf 64 - 1 VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 



670 680 690 700 

or f 64ng- 1 . pep PAGTGLGLPWKKI IGEHGGRISLSNQDAGGACVRI ILPKTVETYAX 

35 | || | || || I I I I I II II II I I I II I I I II I II II II II II M II I I 

orf 64 - 1 PAGTGLGLPWKKI IEEHGGRISLSNQDAGGACVRI ILPKTVKTYAX 

670 680 690 700 



Furthermore, ORF64ng-l (SEQ ED NO: 258) shows significant homology to a protein (SEQ ID 
40 NO: 1 129) from Axaulinodans: 

sp | Q04 8 5 0 | NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY ) gi | 77479 | pir || S18624 ntrY 
protein - Azorhizobium caulinodans )gi| 38737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 
Score = 218 bits (550), Expect = 7e-56 
45 Identities = 195/720 (27%), Positives = 320/720 (44%), Gaps = 58/720 (8%) 



Query: 7 IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 

I+A+ ++L GLT + + + R++KRG 

Sbjct: 35 ISALATFLILMGLTPWPTHQWIS VLLVNAAAVLILSAMVGREIWRIAKARARGR 90 
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Query : 67 FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

++ + R+ G+F +V+V+P + + +++ ++ + + WF T E + S++++++ + 
Sbjct: 91 AAARLH I RI VGLFA WS WPA I LVA WASLTLDRGLDRWFSMRTQE I VASS VS VAQTYVR 150 

Query: 127 LAADNAVSNAVPVQ IDL I GTASLSGNMGS VLEHYAG - - SGFAQLALYNAASGKI EKS INP 184 
5 AN+ + +DL S+ YGSFQ+ AA - + ++ ■ 

Sbjct: 151 EHALN I RGD I LAMS ADLTRLKS V YEGDRSRFNQ I LTAQAALRNLPGAML I 200 

Query: 185 HQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYA 233 

+ D++++ 1+ V + +IG Q + N DY 

Sbjct: 201 RR-DLSWERAN-VNIGREFIVPANLAIGDATPDQPVIYLP- - NDADYVAAWPLKDYDD 256 

10 Query: 234 - -LFFRQPIPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTXXXXXXXXXXXXXVMA 291 

L++I V ++AYL + G+Q F + + 

Sbjct: 257 LYLYVARLIDPRVIGYLKTTQETLADYRSLEERRFGVQVAFALMYAVITLIVLLSAVWLG 316 

Query: 292 LYFARRFVEPILSLAEGAKAVAQGDFSQTRPVLRND-EFGRLTKLFNHMTEQLSIXXXXX 350 
L F++ V PI L A VA+G+ P+ R + + L + FN MT +L 

15 Sbjct: 317 LNFS KWLVAP I RRLMS AADHVAEGNLDVRVP I YRAEGDLASLAETFNKMTHELRSQREAI 376 

Query: 3 51 XXXXXXXXXXXHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGW 410 

+ E VL G+ GV+ D + R+ N++AE++LG L+ + RH 
Sbjct: 377 LTARDQ I DSRRRFTEAVLSGVGAGVIGLDSQER I T I LNRS AERLLG - - LS EVEALHRHLA 434 

Query: 411 HGVSAQQSLLAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 467 

20 V LL E + VQ D + + V E + +G V+ 

Sbjct: 435 EWPETAGLLEEA EHARQRS VQGN I TLTRDGRERVFAVRVTTEQS PEAEHGWW 488 

Query: 468 VIDD I TVL I RAQKEAAWGE VAKRLAHE I RNPLTP IQLS AERLAWKLGGKLDDQDAQ I LTR 527 

+DDIT LI AQ+ +AW +VA+R+AHEI+NPLTPIQLSAERL KG + QD +1 + 
Sbjct: 489 TLDDITELI SAQRTSAWADVARRI AHE I KNPLTP IQLSAERLKRKFGRHV - TQDREI FDQ 547 

25 Query: 52 8 STDTIIKQVAALKEMVEAFRNYARAPSLKLENQDLNALIGDVLALYEAGPCRFEAELAGE 58 7 

TDTII+QV + MV+ F ++AR P +++QD++ +1 + L G + 
Sbjct: 548 CTDTIIRQVGDIGRMVDEFSSFARMPKPWDSQDMSEIIRQTVFLMRVGHPEWFDSEVP 607 

Query: 588 PLMMAA - DTTAMRQVLHN I FKNXXXXXXXXDMPEVRVK S ETGQDGR I VLTVCD 63 9 

PMA D +QLNIKN P+VR + + G+D +V+ + D 

30 Sbjct: 608 PAMPARFDRRLVSQALTNILKNAAEAIEAVP- PDVRGQGRIRVSANRVGED- -LVIDI ID 664 

Query: 640 NGKGFGKEMLHNAFEPYVTDKPAGTGLGLPWKKI IGEHGGRISLSNQDAG- GACVRI IL 698 

NG G +E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
Sbjct:' 665 NGTGLPQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 724 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
35 and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 31 



The following partial DNA sequence was identified in N. meningitidis (SEQ ED NO: 259): 
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1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG. CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

4 01 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG. . . 



This corresponds to the amino acid sequence (SEQ ID NO: 260; ORF66): 



1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR IALASFAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 
151 HALDT. . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 261): 



1 


ATGTACGCAT 


TTACCGCCGC 


51 


GCTTTTTCAT 


ATCCTCATCA 


101 


CTTTCCAAAT 


TTTCGGCATC 


151 


TTCATCTTCC 


TTGCCACCGA 


201 


GGCACGGCGG 


ATTATCTTTT 


251 


ACGTCTTTTC 


CGTTTTGTTC 


301 


CTGTCCGAAT 


TCAACACCTT 


351 


CGCCTACGCG 


ATCGGACAAA 


401 


GCCGTCTGAA 


AGCGTGGTGG 


451 


AACGCCTTGG 


ATACGCTGGT 


501 


CGATGGATTT 


ATGGCGGCAA 


551 


TGTTCAAACT 


TACCGTCTGC 


601 


ATACTGAATC 


TGCTGACGAA 


651 


GCAAGACCGC 


CCCGCGCCCT 



ACAGCAACAG AAGGCACTCT TCCGGCTGGT 
TCGCCGCCAG CAACTATCTG GTGCAGTTCC 
CACACCACTT GGGGCGCATT TTCCTTTCCC 
CCTGACCGTC CGCATTTTCG GTTCTCACTT 
GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 
CACAACGGCA GTTGGACAGG CTTGGGCGCG 
TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 
TCCTTGATAT TTTTGTATTC AACAAATTAC 
ATTGCACCGA CCGCATCAAC CGTCATCGGC 
ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 
ACTGGCAGGG CATCGCTTTT GTCGATTACC 
ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 
AAAACTGACA ACCCTGCAAA CCAAACAGGC 
CGCTGCAAAA TCCGTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 262; ORF66-1): 



1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFSV LF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA IGQILDIFV F NKLRRLKAWW IAPTAS TVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical protein o221 (SEP ID NO: 1130) of E. coli (accession number 
P37619) 



ORF66 (SEQ ID NO: 260) and o221 protein (SEQ ID NO: 1 130) show 67% aa identity in 155aa 
overlap: 
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orf66 


1 


MYAFTAAQQQKALFRLVLFH I L 1 1 AASNYLVQFPFQ I FG I HTTWGAFS FP F I FLATDLTV 


60 






M F+ Q+ KALF L LFH+L+I +SNYLVQ P lb HI J. WbAr br ir LiAiuiji v 




o221 


1 


MNVFSQTQRYKALFWLSL FHLLVITS SNYLVQLP VS I LGFHTTWGAFS F P F I FLATDLTV 


60 


orf66 


61 


RIFGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRI ALASFAAYA 


120 






RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 




o221 


61 


RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATAS FMAYA 


120 


orf66 


121 


I GQ I LD I F VFNKLRRLKAWW I APNASTV I GHALDT 155 








+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 




o221 


121 


LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 





Homology with a predicted ORF from N. meningitidis (strain A) 

ORF66 (SEQ ID NO: 260) shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) 
(SEQ ID NO: 264) from strain A of N. meningitidis: 

10 20 30 40 50 60 

MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 

II III II I II MM Mill III IMMIMIM III 1 1 1 1 1 1 i I i 1 1 1 II 1 1 1 1 1 1 1 1 

MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFSFPFI FLATDLTV 
10 20 30 40 50 60 

70 80 90 100 110 120 

RIFGSHLARR I I FWVMFPALLLSYVFSV LFHNGSWTGLGALSEFNTFVGR I A LASFAAYA 

II 1 1 1 M II II M M M I II 1 1 M 1 1 M II I II M II 1 1 1 1 M I II 1 1 1 1 M 1 1 II 1 1 M 

RIFGSHLARR I I FWVMFPALLLSYVFS VLFHNGSWTGLGALSEFNTFVGR I ALASFAAYA 
70 80 90 100 110 120 

130 140 150 

I GQ I LD I F VFNKLRRL KAWW I APNAS TV I GHALDT 
: I I I I I I I I I I I I I I I I I I h I I : I I I I I I : I I I I 

LGQ I LD I FV FNKLRRLKAWWVAPTAS TVI GNALDTLVFFAVAF YAS SDGFMAANWQG I AF 
130 140 150 160 170 180 

VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 
190 200 210 220 

The complete length ORF66a nucleotide sequence (SEQ ID NO: 263) is: 



orf 66 .pep 
orf 66a 

orf 66 .pep 
orf 66a 

orf 66 .pep 
orf 66a 

orf 66a 



1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

3 51 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 
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This encodes a protein having amino acid sequence (SEQ ID NO: 264): 

1 MYAFTAAQQQ KALFWLVLFH ILIIAASNYL VQFPFQISGI HTTWGAFSFP 

51 FIFIiATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I A LAS FAAYA LGQILDIFV F NKLRRLKAWW VAPTAS TVIG 

5 151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

201 ILNLL TKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a (SEQ ID NO: 264) and ORF66-1 (SEQ ID NO: 262) show 97.8% identity in 228 aa 
overlap: 

10 10 20 30 40 50 60 

orf66a.pep MYAFTAAQQQ KALFWLVLFHILI I AASNYLVQFPFQISGIHTTWGAFSFPFIFLATDLTV 

IIIIIIIIIIIMI IIIIIIIIIIIM illlll MIIIIIMIM I llllll 

orf 66-1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 66a. pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 

1 1 1 M I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 M 1 1 1 1 

orf 66-1 RIFGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRI ALAS FAAYA 

70 80 90 100 110 120 

20 130 140 150 160 170 180 

orf 66a . pep LGQILDIFVFNKIjRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
: I I I I I I I I I I I I I I I I I I :i I I I : I I I I II I I I I I I M I I I I I I I I I ■ I I I I I I I I I 
orf 66-1 I GQ I LD I FVFNKLRRLKAWWI APTASTVI GNALDTLVFFAVAFYAS SDGFMAANWQG I AF 

130 140 150 160 170 180 

25 190 200 210 220 229 

orf 66a . pep VDYLFKLTVCGLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 

Mil Mill 1 1 1 1 1 1 1 M MM I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 

orf 66 - 1 VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 

190 200 210 220 

30 Homology with a predicted ORF from N. gonorrhoeae 

ORF66 (SEQ ID NO: 260) shows 94.2% identity over a 155aa overlap with a predicted ORF 
(ORF66.ng) (SEQ ID NO: 266) from N. gonorrhoeae: 

or f 6 6 . pep MYAFTAAQQQKALFRLVLFHI L 1 1 AASNYLVQFPFQI FGIHTTWGAFS FPFI FLATDLTV 6 0 

1 1 1 M 1 1 1 II II M 1 1 M II 1 1 1 1 1 II II 1 1 1 1 1 M I I IIIIIIIIIIIM II III II 

35 orf66ng MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

orf 66 .pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

Ml II II II II II II II III I II II MM II II MM II MM I MIIIIIMIM 

orf 66ng RIFGSHLARJ^IIFWVNFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 12 0 

orf 66 .pep IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

40 Mill II II MM II II 1 1 III I MM MM II I 

orf 66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 



The complete length ORF66ng nucleotide sequence (SEQ ID NO: 265) is: 
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1 

51 
101 
151 
2 01 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGTACGCAT 
GCTTTTCCAT 
CCTTCCGGAT 
TTCATCTTCC 
GGCGCGGCGG 
aCGTCTTTTC 
CtgTCCCAAT 
CGCCTACGCG 
GCCGTCTGAA 
AATGCACTGG 
CGATGAATTT 
TGTTCAAACT 
ATACTGAATC 
GCAAGACCGC 



TGACCGCCGC 
ATCCTCATCA 
TTTCGGCATC 
TCGCCACCGA 
ATTATCTTTT 
CGTTTTGTTC 
TCAACACCTT 
CTCGGACAAA 
AGCGTGGTGG 
ACACGTTAGT 
ATGGCGGCAA 
TACCGTCTGC 
TGCTGACGAA 
CCCGTGCCCT 



ACAGCAACAG 
TCGCCGCCAG 
CACACCACTT 
CCTGACCGTC 
GGGTGATGTT 
CACAACGGCA 
TGTCGGACGC 
TCCTTGATAT 
ATTGCCCCGG 
ATTTTTTGCC 
ACTGGCAGGG 
ACCCTCTTCT 
AAAACTGACG 
CGCTGCAAAA 



AAGGCACTCT 
CAACTATCTG 
GGGGCGCGTT 
CGCATTTTCG 
CCCCGCCCTT 
GTTGGACGGG 
ATCGCGCTGG 
TTTCGTATTC 
CCGCATCAAC 
GTTGCCTTTT 
CATCGCTTTT 
TCCTGCCCGC 
GCCCTGCAAA 
TCCGTAA 



TCCGGCTGGT 

GTGCAGTTCC 

TTCCTTTCCC 

GTTCGCACTT 

ttgCTTTcat 

CTTGGGCGCG 
CAAGTTTTGC 

GACAAATTAC 
CGTCATCGGC 
ACGCAAGCAG 
GTCGATTACC 
CTACGGCGTG 
CCAAACAGGC 



This encodes a protein having amino acid sequence (SEQ ID NO: 266): 



1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNGSWTGLGA 

101 PSQ FNTFVGR I ALAS FAAYA LGQILDIFVF DKLRRLKAWW IAPA ASTVIG 

151 NALDTLVFFA VA FYASSDEF MAANWQGI AF VDYLFKLTVC T LFFLPAYGV 

201 I LNLLTKKLT ALQTKQAQDR PVPSLQNP* 



An alternative annotated sequence is: 

1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFSFP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSQFNTFVGR I A LAS FAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 I LNLL TKKLT ALQTKQAQDR PVPSLQNP* 

ORF66ng (SEQ ID NO: 266) and ORF66-1 (SEQ ID NO: 262) show 96.1% identity in 228 aa 
overlap: 



orf 66-1 .pep 



orf 66ng 



orf 66-1 .pep 
orf 66ng 
orf 66-1 .pep 
orf 66ng 



orf 66-1 .pep 



orf 66ng 



MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 

I M I I I I I I I I I I I I I I I M I I I I I I I I I II M I I I I I I I I ' I I i I I I I I I I I i 
MYALTAAQQQKALFRLVLFH I LI I AASNYLVQFPFRI FGIHTTWGAFSFPFI FLATDLTV 



I GQ I LD I FVFNKLRRLKAWW I APTAS TV I GNALDTLVFFAVAF YAS SDG FMAANWQG I AF 

HIIIIIMMIIIIIIIIIIM MINI llllllllllllll lllllllllll 

LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 
VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 22 9 

IIIIIIMIIIIIIIIIIIIIIIMIIM lllill MINIM 

VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 229 



60 



60 



RIFGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRI ALAS FAAYA 12 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : I I I I I I ! M I I I I ' M : I II I I I M I I I I I I I I 
RIFGSHLARRI I FWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGR I ALAS FAAYA 12 0 



180 



180 



Furthermore, ORF66ng (SEQ ID NO: 266) shows significant homology with an Exoli ORF (SEQ 
ID NO: 1130): 
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sp|P37619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC REGION 
(0221) 

)gi | 1073495 |pir | |S47690 hypothetical protein o221 - Escherichia coli )gi|466607 
(U00039) No definition line found [Escherichia coli] )gi| 1789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
Length = 221 

Score = 273 bits (692), Expect = 5e-73 

Identities = 132/203 (65%), Positives = 155/203 (76%) 

Query: 1 MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 

M + Q+ KALF L LFH+L+I +SNYLVQ P I G HTTWGAFS F P F I FLATDLTV 
Sbjct : 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFI FLATDLTV 60 

Query: 61 RIFGSHLARRI IFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
Sbjct: 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

Query: 121 LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 
Sbjct: 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 180 

Query: 181 VDYLFKLTVCTLFFLPAYGVILN 203 

VDY FK+ + +FFLP YGV+LN 
Sbjct: 181 VDYCFKVLISIVFFLPMYGVLLN 203 



Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 32 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 267): 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAAyGCA GTmwrAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

2 01 TTTAACACAC AyyCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 
251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

3 01 CGCTTAGgCG CGAAATTCAG CACAAGGGCG GTtCCCTATG TCGGAACAGC 
351 CcTTTTAGCC CACGACGTAT ACGAAAcTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGTAAA AGGCTACGAA 
4 51 TATAGTAATT GCCTTTGGTA CGAAGACAAA AGACGTATTA ATAGAACCTA 
501 TGGCTGCTAC GGCGTTGAT . . 

This corresponds to the amino acid sequence (SEQ ID NO: 268; ORF72): 



1 MVIKYTNLNF AKLSI IAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD . . 
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Further work revealed the complete nucleotide sequence (SEQ ID NO: 269): 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

5 101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

10 3 51 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence (SEQ ID NO: 270; ORF72-1): 



15 1 MVIKYTNLNF AKLSIIAILM MYSFEANAN A VKISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

20 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from A J . meningitidis (strain A) 

ORF72 (SEQ ID NO: 268) shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) 
(SEQ ED NO: 272) from strain A of N. meningitidis: 

10 20 30 40 50 60 

25 orf72.pep MVIKYTNLNFAKLSI IAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I J I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf 72a MVIKYTNLNFAKLSI IAILMMYSFEANAN AVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

30 orf 72 .pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II III II I I I I I I 
orf72a DLI KTVDLTH I PTGAKAR I NAKI TAS VSRAGVLAGVGKLARLGAKFS TRAVP YVGTALLA 

70 80 90 100 110 120 

130 140 150 160 170 

35 orf 72 .pep HDVYETFKED I QARGYQ YD PETDKFVKG YE YSNCLWYEDKRRINRTYGCYGVD 

I I M I ; I I I I I I I I ; I I I I I I I I h 
orf 72a HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

40 The complete length ORF72a nucleotide sequence (SEQ ID NO: 27 1 ) is: 



1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 
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201 TTTAACACAC 

251 CCGCCAGCGT 

301 CGCTTAGGCG 

351 CCTTTTAGCC 

4 01 GAGGCTACCA 

451 TAA 



ATCCCTACGG 
ATCCCGCGCC 
CGAAATTCAG 
CACGACGTAT 
ATACGACCCC 



GCGCAAAAGC 
GGCGTATTGG 
CACAAGGGCG 
ACGAAACTTT 
GAAACCGACA 



CCGAATCAAC 
CGGGGGTCGG 
GTTCCCTATG 
CAAAGAAGAC 
AATTTGCAAA 



GCCAAAATAA 
CAAACTTGCC 
TCGGAACAGC 
ATACAGGCAC 
GGTCTCAGGC 



This encodes a protein having amino acid sequence (SEQ ED NO: 272): 

1 MVIKYTNLNF AKLSIIAILM MYSFEANAN A VKISETVSVD TGQGAKIHKF 

10 51 VPKNSKTYSS DLIKTVDLTH I PTGAKAR IN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 

151 * 

ORF72a (SEQ ID NO: 272) and ORF72-1 (SEQ ID NO: 270) show 100.0% identity in 150 aa 
15 overlap: 



10 20 30 40 50 60 

orf 72a. pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

I I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M 
orf 72-1 MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72a . pep DLIKTVDLTH I PTGAKAR INAKI TAS VSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I i I I I I I I I I I II I I II I I I I I I I I I I I I I I 
orf 72 - 1 DLI KTVDLTHI PTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 

orf 72a . pep HDVYETFKED I QARG YQ YD PETDKFAKVSGX 

MM Illlllll IIIIIIIIMIIIII! 

orf 72 - 1 HDVYET F KED I Q ARG YQ YD PETDKF AKVS GX 

130 140 " 150 



20 



25 



30 



Homology with a predicted ORF from N. gonorrhoeae 

ORF72 (SEQ ID NO: 268) shows 89% identity over a 173aa overlap with a predicted ORF 
(ORF72.ng) (SEQ ID NO: 274) from N. gonorrhoeae: 



35 



40 



orf 72 .pep 
orf 72ng 
orf 72 .pep 
orf 72ng 
orf 72 .pep 
orf 72ng 



MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 

II hlllll II II I II I I II I II II II II I I II hlllll II I hi I II II :|= III 
MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 



HDVYETFKED IQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 

IIIMIIIIIIIIII :|llllllllllllhllllllhlMIIIIIIIIII 
HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 



60 



60 



120 



DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

II hlllll lllllllllllllllllllllhllllhl lllhlllllllllMII 
DLTKAVDLTH I PTGAKAR I NAKI TAS VS RAGVLSGVGKLVRQGAKFGTRAVP YVGTALLA 120 



173 



180 
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An ORF72ng nucleotide sequence (SEQ ID NO: 273) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 274): 

1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGNPVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence (SEQ ID NO: 275) was identified: 



1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

3 01 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 
351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 



This corresponds to the amino acid sequence (SEQ ID NO: 276; ORF72ng-l): 



1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 



ORF72ng-l (SEQ ID NO: 276) and ORF721-1 (SEQ ID NO: 270) show 89.7% identity in 145 aa 



overlap: 



10 20 30 40 50 60 

orf 72ng-l.pe MVTKHTNLNFAKLS I I AILMMYSFEANANAVK I SETLSVDTGQGAKVHKF VPKSSNIYSS 
II hli I I I I I i I I : M I M I I I I I I I I II I I h I M I I I I I hi I I I I I : h I I I 
or f 7 2 - 1 MVI KYTNLNFAKLS I IAILMMYS FEANANAVKI S ETVS VDTGQGAKIHKFVPKNS KTYSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 72ng-l .pe DLTKAVDLTH I PTGAKAR I NAKI TAS VS RAG VLSGVGKLVRQGAKFGTRA VPYVGTALLA 

II M I ! i 1 1 1 ! I M I i 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 lh 1 1 M : I lllhlllllllllllll 

orf 72-1 DL I KTVDLTH I PTGAKAR I NAKI TAS VSRAGVLAGVGKLARLGAKFSTRAVP YVGTALLA 

70 80 • 90 100 110 120 



130 140 
orf 72ng-l .pe HDVYETFKED I QARGCRYDPETDKF 

IIIIIIIMIIMII MIIIIMI 

orf 72 - 1 HDVYETFKED I QARGYQ YD PETDKFAKVSGX 

130 140 150 
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Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 277): 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCAAACCGGG 

151 gCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 278; ORF73): 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 279): 

1. ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

3 51 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 ATATTATCGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCATCGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 280; ORF73-1): 



1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 LSGLLLAGAA MRSGGRVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 

151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF73 (SEQ ID NO: 278) shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) 
(SEQ ID NO: 282) from strain A of N. meningitidis: 
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10 



10 20 30 40 50 60 

orf 73 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 

I I I . I I I II I ' I I I I I M I I I I II I I M I I I ! I I I 1 I : I I I : ! I I = 1 I i I I 1 I 
orf 73a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVVMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 

orf 73 . pep MRSGGKVS VYQMLWP I 
llllhllll III I 

orf 73a MRS GGRVS VYXMLWX I R YTVAAVC XMS PGFVS S VXAVLLXL P F KGGAVLQAGGAENF FNM 

The complete length ORF73a nucleotide sequence (SEQ ID NO: 281) is: 



1 ATGAGATTTT TCGGTATCGG . TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

J 51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

15 151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

3 51 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 
20 4 01 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

4 51 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 282): 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 
25 51 LSGLLLAGAA MRS GGRVS VY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 
151 FRNAXEHKKD E* 

ORF73a (SEQ ID NO: 282) and ORF73-1 (SEQ ID NO: 280) show 91.3% identity in 161 aa 
30 overlap 

10 20 30 40 50 60 

orf 73a . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M i I M 1 1 1 1 i 1 1 1 1 1 1 1 1 M 

orf 73 - 1 MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 
35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 73a . pep MRS GGRVS VYXMLWX I R YTVAAVCXMS PGFVS SVXAVLLXLPF KGGAVLQAGGAENF FNM 

II II II MM III III II MM I MM I III MM II 1 1 1 II 1 1 1 1 1 1 II 1 1 1 M 

orf 73 - 1 MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
40 70 80 90 100 110 120 

130 140 150 160 

orf 73a. pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 

I MM I IIIIIMIIIIIII MM I III HUM 

orf 73 - 1 NQSGRKEGFSRDDD 1 1 EGEYTVEEPYGGNRSRNAI EHKKDEX 

45 130 140 150 160 



Homology with a predicted ORF from N. gonorrhoeae 
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ORF73 (SEQ ID NO: 278) shows 92.1% identity over a 76aa overlap with a predicted ORF 
(ORF73.ng) (SEQ ID NO: 284) from N. gonorrhoeae: 



orf 73 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 

I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I II I I : I II : I I I I I I I I 
orf 73ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 



10 



orf 73 .pep 
orf 73ng 

The complete length ORF73ng nucleotide sequence (SEQ ID NO: 283) is 



60 



60 



76 



MRSGGKVS VYQMLWP I 
: : I :| I I I I M I I I I I 

VKS SGKVS VYQMLWP I RYT VAAVCLMS PGFVS S VLAVLLLL P FKGGAVLQAGGAENFFNM 120 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



ATGAGATTTT 
GTCGATTGTG 
TAATGGCGGC 
CTGTCCGGTC 
ATCTGTTTAT 
GTCTGatgag 
CTGCcgttta 
TTTCAACATg 
atattatcga 
tcccgaAAcg 



TCGGTATCGG 
TGGGTTGCCG 
AACCTTTGCC 
TTTTATTGGC 
CagatgtTGT 
tCcggGATTC 
aggGaggGgc 
aaCcaatcgg 
gggagaatat 
ccatcgaaca 



TTTTTTGGTG 
ATTGGCTGGG 
GCCGGTGTGC 
TGGCGCGGCG 
GGCCTATCCG 
GTATCCTccg 
agtgttgcag 
gcagaaAaga 
acggttgaaa 
cgaaaAagac 



CTGCTGTTTT 
CGGCGGTTGG 
TGATGCTCAG 
GTAAAAagta 
TTATAcggtg 
tgttggCGGT 
gcaggaggtg 
gggatttttc 
aacctgacgg 
gaataA 



TGGAAATTAT 
AcgcTGTTTC 
GCATAcggGG 
gtgGGAAGGT 
gcggcggtgT 
ATTGCTGCTG 
cggaaaATTT 
cacgatgacg 
cggcaatcgt 



25 



This encodes a protein having amino acid sequence (SEQ ID NO: 284): 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 
51 LSGLLLAGAA VKSSGKVSVY QMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 
101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 
151 SRNAIEHEKD E* 

ORF73ng (SEQ ID NO: 284) and ORG73-1 (SEQ ID NO: 280) show 93.8% identity in 161 aa 
overlap 



30 



35 



40 



10 20 30 40 50 60 

orf 73 - 1 . pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

1 1 1 1 1 1 1 r 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 ! Mil IIIIIIMIIIIIMIII 

orf 73ng MRFFGIGFLVLLFLEIMSIVWADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 73 - 1 . pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

:: I : I : I I I I I I I I I I I I I I I 11 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 73ng VKS SGKVS VYQMLWP I RYTVAAVCLMS PGFVS S VLAVLLLL P FKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

130 140 150 160 

orf 73-1 .pep NQSGRKEGFSRDDDI I EGEYTVEEPYGGNRSRNAI EHKKDEX 

MINIMI Mllllllllllhl 1 1 E I M 1 1 M : 1 1 1 1 

orf 73ng NQSGRKEGFFHDDDI IEGEYTVEKPDGGNRSRNAIEHEKDEX 

130 140 150 160 
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Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 34 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 285): 

1 ATGTTTGTTT TTCAGACGGC ATTCTT . ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

3 51 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

4 01 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 
4 51 GATTTTTATT TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 
501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA 
551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 
601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 
651 CTTAAGCGGC ACGGTTGGGG AAATTCAGAC GGCATTGTCT GCCGACGGCG 
701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 
751 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACATCATGA AAATCCTCAC 
801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 
851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence (SEQ ID NO: 286; ORF75): 



1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A....AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 ' AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD.. 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 287): 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

2 01 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

2 51 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

4 01 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 
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651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA -AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence (SEQ ID NO: 288; ORF75-1): 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

10 101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

1 5 Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF75 (SEQ ID NO: 286) shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) 
(SEQ ID NO: 290) from strain A of N. meningitidis: 



10 20 30 40 50 60 

MFVFQTAFXMFQKHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKAXXXXAEDTR 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

M FQKHLQKAS DS WGGTL Y WAT P I GNLAD I TLRALAVLQKAD 1 1 CAEDTR 
10 20 30 40 50 

70 80 90 100 110 120 

VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMVVAQVSDAGTPAVCDPGAKLAR 

MMIMIIIIIIIIII MMII'IIIIII IMIII llllll'IIIIIMMMI! 

VT AQLLS A YG I QGKL VS VREHNERQMADK I VG YLSDGM WAQVS DAGT PA VCD PGAKLAR 
60 70 80 90 100 110 

130 140 150 160 170 180 

RVREAGF KWPWGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 

I I I M I I ] I I i I I IMIIMIIII 111 I 1 I I 1 I I I ! I I I - I 1 I 1 M I I : I I I : I 
RVREVGF KVVPVVGASAVMAALSVA GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVV 
120 130 140 150 160 170 



20 or f 7 5. pep 

orf75a 

25 orf75.pep 
orf 75a 

30 orf 75. pep 

orf 75a 



190 200 210 220 230 240 

35 orf 75 . pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 

1 1 1 1 1 1 M M 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M I M 1 1 1 1 

orf 75a MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 
180 190 200 210 220 230 



250 260 270 280 290 

40 orf 75 .pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I M 
orf 75a VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 



orf75a X 

45 
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The complete length ORF75a nucleotide sequence (SEQ ID NO: 289) is: 

1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 " GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

5 151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

3 51 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 
10 4 01 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

4 51 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 
501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 
551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 
601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

15 651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATGA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

20 

This encodes a protein having amino acid sequence (SEQ ID NO: 290): 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGF KV VPWGASAVM AALSVA GVAG SDFYFNGFVP 

25 151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

2 01 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a (SEQ ID NO: 290) and ORF75-1 (SEQ ID NO: 288) show 98.3% identity in 291 aa 
30 overlap: 

10 20 30 40 50 60 

orf75a.pep M FQKHLQKASDS WGGTL YWATP I GNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AY 

I I I I I 1 I I I I I I I I I I I I I I I I I I I I II ! I I I I I I I I I I I I I I I I I I I I I I M I I M 
orf 75 - 1 MFQKHLQKASDSWGGTLYWATPIGNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AY 

35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 75a . pep GIQGKLVSWEHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGFKV 
I ! I ; I I I I I I I I i I I I ! I I I ! M I I I I I I I M I I I I I I I I I I I I I I I I I I I IM I II 
orf 75-1 GIQGKLVSVREHNERQMADKIVGYLSDG1WVAQVSDAGTPAVCDPGAKLARRVREAGFKV 
40 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 75a . pep VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIG 

II IIIIIIIIIIM II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M I M 1 1 1 : l 1 1 1 

orf 75-1 VP WGASAVMAALSVAGVEGSD FY FNGFVPPKSGERRKLFAKWVRAAF PI VMFETPHRIG 

45 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75a. pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

I ■ I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I I 
orf 75-1 ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
50 190 200 210 220 230 240 
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250 260 270 280 290 

orf 75a . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
lllllill IIMIMIIIII lllllllllll] IIIIIIIIIMMIM 
orf 75-1 EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
5 250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF75 (SEQ ID NO: 286) shows 93.2% identity over a 292aa overlap with a predicted ORF 
(ORF75.ng) (SEQ ID NO: 292) from N. gonorrhoeae: 

orf 75 . pep MFVFQTAFXMFQKHLQKASDS WGGTLYWATP I GNLAD I TLRALAVLQKA AEDTR 56 

10 | lllllllllll MM I lllllllllll III 1 1 II II II MM II III I Mill 

or f 7 5ng MS VFQTAFFMFQKHLQKASDS WGGTLYWATP IGNLAD ITLRALAVLQKADI I CAEDTR 6 0 

or f 7 5 . pep VTAQLLSAYGIQGKLVSVREHNERQMADKI VGYLSDGMWAQVSDAGTPAVCDPGAKLAR 116 

I I M I I I I I I I Ml I II I I I I M I I I MMM M Ml I I I I I I I I M I I I I I I I I I M 
orf 75ng VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

15 orf 75 .pep RVREAGFKWPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 176 

Illllllllllllll 1 1 1 ! 1 1 1 1 1 1 f MM I ! 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 1 1 1 h I 

orf 7 5ng RVREAGFKWPWGASAVMAALS VAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPW 180 

orf 75 . pep MFETPHR I GAALADMAELFPERRLMLARE I TKTFETFLSGTVGE I QTALS ADGDQSRGEM 236 

MUM II Ihlllll I II II Mill lllllllllll II Mill MMMMIMI I 

20 orf 75ng MFETPHR I GATLADMAELFPERRLMLARE I TKTFETFLSGTVGE I QTALAADGNQSRGEM 240 

orf 75 . pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 288 

1 1 1 1 M II i 1 1 II M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 M 

orf 75ng VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 3 00 

25 An ORF75ng nucleotide sequence (SEQ ID NO: 291) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 292): 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATP I GNLAD I TLRALAVLQK 

51 ADI I CAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KW PWGASAVMA ALSVA GVAES 

30 151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

35 After further analysis, the following gonococcal DNA sequence (SEQ ID NO: 293) was identified: 

X ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

40 201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

3 51 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

4 01 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 



CHIR-0160 (356.001) 



-253- 



PATENT 



10 



451 
501 
551 
601 
651 
701 
751 
801 
851 



CCGAAATCGG 
ATTTCCTGTC 
CCGATATGGC 
ATCACGAAAA 
GACGGCATTG 
TGCTTTATCC 
CAAAATGCGA 
GGAGCTTGCC 
TGGCACTGTC 



GCGAACGTAG 
GTCATGTTTG 
GGAATTGTTC 
CGTTTGAAAC 
GCGGCGGACG 
GGCGCAGGAT 
TGAAAATCCT 
GCCAAGATTA 
GTGGAAAAAC 



GAAATTGTTT 
AAACGCCGCA 
CCCGAACGCC 
GTTCTTAAGC 
GCAACCAATC 
GAAAAACACG 
TGCGGCCGAG 
CAGGTGAGGG 
AAATGA 



GCCAAATGGG 
CCGAATCGGG 
GTCTGATGCT 
GGCACGGTTG 
GCGCGGCGAG 
AAGGCTTGTC 
CTGCCGACCA 
CAAAAAGGCT 



TGCGGGCGGC 
GCAACGCTTG 
GGCGCGCGAA 
GGGAAATTCA 
ATGGTGTTGG 
CGAGTCTGCG 
AGCAGGCGGC 
TTGTACGATT 



This corresponds to the amino acid sequence (SEQ ID NO: 294; ORF75ng-l): 



15 



1 MFQKHLQKAS 

51 RVTAQLLSAY 

101 AVCDPGAKLA 

151 PKSGERRKLF 

201 ITKTFETFLS 

251 QNAMKILAAE 



DSWGGTLYV 
GIQGRLVSVR 
RRVREAGFKV 
AKWVRAAFPV 
GTVGEIQTAL 
LPTKQAAELA 



VATPIGNLAD 
EHNERQMADK 
VPWGASAVM 



ITLRALAVLQ 
VIGFLSDGLV 
AALSVAGVAE 



VMFETPHRIG 
AADGNQSRGE 
AKITGEGKKA 



ATLADMAELF 
MVLVLYPAQD 
LYDLALSWKN 



KADIICAEDT 
VAQVSDAGTP 
SDFYFNGFVP 
PERRLMLARE 
EKHEGLSESA 
K* 



20 



ORF75ng-l (SEQ ID NO: 294) and ORF75-1 (SEQ ID NO: 288) show 96.2% identity in 291 aa 
overlap: 



25 



10 20 30 40 50 60 

orf 75-1. pep MFQKHLQKASDS WGGTLYWATP I GNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AY 

IIIIIMMIMIII IIIMIIIIMMIIII Mllllll IMIIMIIIIIMM 

orf 75ng-l MFQKHLQKASDS WGGTLYWATP I GNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AY 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 75-1. pep GIQGKLVSVREHNERQMADKI VGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

II I I : II 1 1 1 1 1 1 1 1 1 1 l-h 1 1 Ihl 1 1 1 M I II 1 1 1 M I M I il 1 1 1 1 1 1 1 1 

orf 75ng-l GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 

70 80 90 100 110 120 



35 



130 140 150 160 170 180 

orf 75-1 .pep VPWGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 

I II MM I Ml II I II II I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 
orf 75ng-l VPWGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIG 

130 140 150 160 170 180 



40 



190 200 210 220 230 240 

orf 75-1 .pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 

1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 I I I II I I M I II ! I hi I I I I I M I I I I I I I I I 

orf 75ng-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 



45 



250 260 270 280 290 

orf 75 - 1 . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

1 1 1 1 1 M 1 1 i 1 1 ) 1 1 1 = 1 1 1 M 1 1 1 1 1 1 ) i 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 75ng-l EKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

250 260 270 280 290 

Furthermore, ORG75ng-l (SEQ ID NO: 294) shows significant homology to a hypothetical E.coli 



protein (SEQ ID NO: 1131): 
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Sp I P4 552 8 I YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR -INTERGENIC REGION 
(F286) 

)gi | 606086 (U18997) ORF_f286 [Escherichia coli] 

)gi j 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 
5 [Escherichia coli] Length = 286 

Score = 218 bits (550) , Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%) , Gaps = 4/284 (1%) 



Query: 


4 


KHLQKASDS WGGTLYWATP IGNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AYGIQ 


63 










Sbjct: 


2 


KQHQSADNSQ- -GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 


59 


Query: 


64 


GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKVVPV 


123 






RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 




Sbjct: 


60 


ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 


119 


Query: 


124 


VGASAVMAALSVAGVAESDFYFNGFVPPK5GERRKLFAKWVRAAFPVVMFETPHRIGATL 


183 






G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 




Sbjct: 


120 


PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 


179 


Query: 


184 


ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 


242 






D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 




Sbjct: 


180 


EDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 


238 


Query: 


243 


HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 2 86 








EL A + +L AELP K+AA LAA+I G K ALY AL 




Sbjct : 


239 


EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 2 82 





Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
25 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 35 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 295): 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GC . AAAGCAC CCGAAATCGA CCCGGCTTTG 

30 // 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
751 AAACCGTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 296; ORF76): 

35 1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

// 

2 01 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

40 Further work revealed the complete nucleotide sequence (SEQ ID NO: 297): 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 
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101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

4 01 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 298; ORF76-1): 



1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF76 (SEQ ID NO: 296) shows 96.7% identity over a 30aa overlap and 96.8% identity over 
31aa overlap with an ORF (ORF76a) (SEQ ID NO: 300) from strain A of N. meningitidis: 



10 20 30 

orf 76 .pep MKQKKTAAAVIAAMLAGFAAXKA PEIDPAL 

Illlllllllllllllllll lllllllll 
orf 76a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

// 

70 80 90 

orf 76 . pep XELVRNQLEQGLRQEKARLKIDALLEENGVKPX 

I M ' I I I I I I I I I I I I I I I I I I : I I ■ I I I I I 
orf 76a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 



The complete length ORF76a nucleotide sequence (SEQ ID NO: 299) is: 



1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 
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4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 300): 



10 1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

15 251 KP* 

ORF76a (SEQ ID NO: 300) and ORF76-1 (SEQ ID NO: 298) show 97.6% identity in 252 aa 
overlap: 



10 20 30 40 50 60 

20 orf 76a . pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

1 1 1 1 1 1 1 1 Ml I M 1 1 1 1 1 1 i I M 1 1 1 1 1! 1 1 1 II I I M 1 1 1 1 1 1 1 II M 1 1 II 1 1 1 

orf 76 - 1 MKQKKTAAAV IAAMLAGFAAAKAPE I DPALVDTLVAQIMQQADRHAEQSQKPDGQA I RND 

10 20 30 40 50 60 



70 80 90 100 110 120 

25 orf 76a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSESALRQF 

I I I I I I I I : I ! I I II ! I I I I I I I I I II I I I I I M ! I I I I I I I I I I I M I I h |::| 
orf 76 - 1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEAS FYAEEYVRFLERSETVSEDELHKF 

70 80 90 100 110 120 



130 140 150 160 170 180 

30 orf 76a . pep YERQ I RM I KLQQVSFATEEEARQAQQLLLKGLSFEGLMKR YPNDEQAFDG FIMAQQLPEP 

I I: I I I I I I I I I I I I ■ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I 
orf 76-1 YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

130 140 150 160 170 180 



190 200 210 220 230 240 

35 orf 76a . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

I I M I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II II M I I 
orf 76-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 



250 

40 orf 76a. pep IDAILEENGVKPX 

I MIMIMII 
orf 76- 1 I DALLEENGVKPX 

250 



Homology with a predicted ORF from N. gonorrhoeae 
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The aligned aa sequences of ORF76 (SEQ ID NO: 296) and a predicted ORF (ORF76.ng) (SEQ ID 
NO: 302) from N. gonorrhoeae of the N- and C-termini show 96.7 % and 100% identity in 30 and 
31 overlap, respectively: 

orf 76 . pep MKQKKTAAAVIAAMLAGFAAXKAPEIDPAL 30 

5 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I MINIMI 

orf76ng MKQKKTAAAVI AAMLAGFAAAKAPE I DPALVDTLVAQ IMQQADRHAEQSQRPDGQAI RND 60 

// 

orf 76. pep ELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

I M i I M I I I I I II I I M I I II ' I I I I I i I 
10 orf 76ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 



The complete length ORF76ng nucleotide sequence (SEQ ID NO: 301) is: 

1 ATGAAACAGA AAAAGACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

15 101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AGACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

3 01 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 
20 3 51 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

4 01 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 
4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 
501 GTTCGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTcgc 
551 agtttgCCGG TATGAACCGT GGCGACGTTA CCCGCAATCC GGTCAAATTG 

25 601 GGCGAACGCT ATTACCTGTT CAAACTCGGC GCGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGGC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAaga Aaacggtgtc 

; 751 AaacCGTAA 



30 This encodes a protein having amino acid sequence (SEQ ID NO: 302): 



1 MKOKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

35 201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDALLEENGV 

251 KP* 

ORF76ng (SEQ ID NO: 302) and ORF76-1 (SEQ ID NO: 298) show 96.0% identity in 252 aa 
overlap 



40 10 20 30 40 50 60 

orf 76 - 1 . pep MKQKKTAAAV I AAMLAGFAAAKAPE I DPALVDTLVAQ I MQQADRHAEQSQKPDGQA I RND 

1 1 1 1 1 1 M 1 1 1 1 1 1 1 M II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 hi 1 1 1 1 1 1 1 , 

orf 76ng MKQKKTAAAV I AAMLAGFAAAKAPE I DPALVDTLVAQ IMQQADRHAEQSQRPDGQAI RND 

10 20 30 40 50 60 



45 70 80 90 100 110 120 

orf 76-1 .pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

1 1 1 II II M 1 1 1 1 1 M 1 1 1 1 II M I II 1 1 1 1 1 M I II 1 1 M 1 1 1 II 1 1 M II I h MM 

orf 76ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 
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70 80 90 100 110 120 

130 140 150 160 170 180 

orf 76-1 .pep YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

MMillllMIIMIMI I IIIIIIIIMIIMIMIM IIIMIIIMMII 

5 orf 76ng YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 76 - 1 .pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 

Ml MM MINIM II MM Mill: lllllllllllllllllllllllllllll 

10 orf 76ng LASQFAGMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 

250 

orf 76-1 .pep IDALLEENGVKPX 

Illllllllllll 
15 orf76ng IDALLEENGVKPX 

250 

Furthermore, ORF76ng (SEQ ID NO: 302) shows significant homology to a B.subtilis export 
protein precursor (SEQ ID NO; 1 1 32): 

20 sp|P24327|PRSA_BACSU PROTEIN EXPORT PROTEIN PRSA PRECURSOR ) gi | 98227 | pir | | S15269 

33K lipoprotein - Bacillus subtilis )gi| 39782 (X57271) 33kDa lipoprotein [Bacillus 
subtilis] 

)gi|2226124|gnl|PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
)gi|263333l|gnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
25 Length =2 92 

Score = 50.4 bits (118), Expect = le-05 

Identities = 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 

Query: 70 VLKNRALKEGLDK DKDVQNRFKIAEASF YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

30 Sbjct: 53 VLTQLVQEKVLDKKYKVSDKEIDNKLKEYKTQLGDQYTALEKQYGKDYLKEQVKYELLTQ 112 

Query: 115 SA LRQFYERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPN 163 

A +++++E 1+ + A ++ A + ++ L KG FE L K Y 

Sbjct: 113 KAAKDNIKVTDADIKEYWEGLKGKIRASHILVADKKTAEEVEKKLKKGEKFEDLAKEYST 172 

Query: 164 DEQAFDG FI^4AQQLPEPLASQFAAiylNRGDVTRDPVKLGERYYLFKLSEVGKNPDA 218 

35 DAGFQ+E++ G+V+ DPVK Y++ K +E D 

Sbjct: 173 DSSASKGGDLGWFAKEGQMDETFSKAAFKLKTGEVS -DPVKTQYGYHI IKKTEERGKYDD 231 

Query: 219 QPFELVRNQLEQGLRQEKA 237 

EL LEQ L A 
Sbjct: 232 MKKELKSEVLEQKLNDNAA 250 

40 Based on this analysis, including the presence of a putative leader sequence and a RGD motif in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF76-1 (SEQ ID NO: 298) (27.8kDa) was cloned in the pET vector and expressed in Exoli, as 
described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 10A shows the results of affinity purification of the His-fusion protein, Purified 
His-fusion protein was used to immunise mice, whose sera were used for Western blot (Figure 
10B), ELISA (positive result), and FACS analysis (Figure 10C). These experiments confirm that 
ORF76-1 (SEQ ID NO: 298) is a surface-exposed protein, and that it is a useful immunogen. 

Example 36 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 303): 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCC1TACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

14 01 CGCCTTGCGA GATTGCCTTC CATCAGCAGC TTTCAACGTT CCTGATTCAC 

14 51 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 304; ORF81): 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 

51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

401 . . . QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 305): 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 

101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 

301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 



CHIR-0160 (356.001) 



-260- 



PATENT 



601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

12 51 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

13 01 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 
1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

14 01 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 
14 51 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 
1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 
1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence (SEQ ED NO: 306; ORF81-1): 

1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

4 51 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF81 (SEQ ID NO: 304) shows 84.7% identity over a 85aa overlap and 99.2% identity over 
121 aa overlap with an ORF (ORF81 a) (SEQ ID NO: 308) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 81 .pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAA KIAETFALTFVIAALYLFA RNKVTRL 

I III::: I I Mill MM III = : I I I I I I M I : II I I II I I I I I I II I M I Mill 
orf 81a MKKSLFVLFLYSSLLTAS EIAYRFVFGIETLPAA KMAETFALTFVIAALYLFA RYKATRL 

10 20 30 40 50 60 

70 80 
orf 81. pep LIAVFFAFSI I ANNVH YADYQSWMT 

IIIIIIIIIIMIIIIII lllhl 
o r f 8 1 a LIAVFFAFSI I ANNVH YAVYQSW I TG I N YWLMLKE I TEVGGAGASMLDKLW LPALWGVLE 

70 80 90 100 110 120 

// 

120 130 140 
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or f 8 1 . pep QT VFEQLQ KT PDGNWL F A YTS DHGQ YVRQD 

III II Ml I I II I M I I M I I I I I I I I 
orf 81a IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
280 290 300 310 320 330 



150 160 170 180 190 200 

orf 81 .pep IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 

MIIIIIMIII IMIMI IMIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIMIII 

orf 81a IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
340 350 360 370 380 390 



210 220 230 

orf 81 . pep CREGS VTGNL I TGDAGS LN I RDGKAE YVYPQX 
1 1 1 1 1 1 1 1 1 1 M : I I I I I I I I I I II I I I M I I 
o r f 8 1 a CREGS VTGNL I TGDAGS LN I RDGKAE YVYPQX 

400 410 420 



The complete length ORF81 a nucleotide sequence (SEQ ID NO: 307) is: 



1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACTTACTGC 

51 CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 

3 01 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 

3 51 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 
4 51 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 
501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 
551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 
601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 
651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 
701 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 
751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 
801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 
851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 
901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 
951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 

1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

12 51 GGCGGAATAT GTTTATCCGC AATGA 



This encodes a protein having amino acid sequence (SEQ ID NO: 308): 



1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFVIAALY 

51 LFARYKATR L LIAVFFAFSI IANNVH YAVY QSWITGINYW LMLKEITEVG 

101 GAGASMLDKL WLPALWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRS FDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQ PDGNWL F AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNL I TGDAGS LNIRDGKAEY VYPQ* 
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ORF81a (SEQ ID NO: 308) and ORF81-1 (SEQ ID NO: 306) show 77.9% identity in 524 aa 
overlap: 

10 20 30 40 50 60 

orf 81a . pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 

5 I I I M I I I I I I I I I I I I I M I I I I I M II I I I I I I I I I I I I I M I M I I 

orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 ' 30 40 50 60 

70 80 90 100 110 120 

orf 81a . pep LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
10 | I | | I I | | II I I ' I M I I I I I H I I I I I I I I h I I ■ h I ! I I I I i I II ;hl i I I I 

or f 8 1 - 1 LI AVFFAFS I IANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 81a. pep VMLFCSIiAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

15 h 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 ! 

orf 81-1 VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 8 la. pep FVGRVLPYQLFDLSKIPVFKQPAPSRIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 

20 II 1 1 1 1 1 1 1 1 1 1 1 h I h 1 1 1 1 1 1 h 1 1 1 1 h 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 . 

250 260 270 280 

orf 8 la. pep LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

25 | | : | | | | | | | | | | | | | | | | | | | | | | | | | | | | | : | | | | | || [ | | | | | | 

orf 81-1- LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



30 



orf 81a .pep 



orf 81-1 TYFYSAQAENEMAILiNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 



290 300 310 320 

orf 81a .pep IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

35 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 81-1 I VLHQRGSHAP YGALLQPQDKVFGEAD I VDKYDNT I HKTDQM I QTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

330 340 350 360 370 380 

orf 81a . pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

40 I I 1 1 1 1 1 1 1 I 1 1 I I I I I 1 1 1 1 1 1 I I I 1 1 1 1 II 1 1 1 1 I I I 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 1 

orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

390 400 410 420 

orf 81a .pep L I HTLGYDMP VSGCREGS VTGNL I TGDAGS LNI RDGKAE YVYPQX 

45 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II 

or f 8 1 - 1 LI HTLGYDMPVS GCREGS VTGNL I TGDAGS LN I RDGKAE YVYPQX 

490 500 510 520 
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Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF81 (SEQ ID NO: 304) and a predicted ORF (ORF81 .ng) (SEQ ID 
NO: 310) from N. gonorrhoeae of the N- and C-termini show 82.4 % and 97.5% identity in 85 and 
121 overlap, respectively: 



orf 81 .pep 
orf 81ng 
orf 81 .pep 
orf 81ng 
orf 81 .pep 
orf 81ng 
orf 81 .pep 
orf 81ng 
orf 81 .pep 
orf 81ng 



MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 

MINIMUM : M M 1 1 1! I M 1 1 1 M 1 1 1 M I II M 1 1 MMI 

MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 

LI AVFFAFS I I ANNVHYADYQSWMT 

I I I I I I I M I I I I I I I llllll 
L I AVF FAFSM I ANNVH YAVYQS WMTG I NYWLMLKE VTE VGS AGASMLDKLWLPALWGVAE 

// 

QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 

1 1 1 1 M 1 11 1 1 1 1 1 1 1 M 1 1 M M 1 1 1 1 

ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 



CREGS VTGNL I TGDAGSLN I RDGKAE YVYPQ 

I I I II I I ■ II I I II I I I IM I I II M 
CREGS VTGNL I TGDAGS LN I RNGKAE YVYPQ 524 



60 



60 



85 



120 



433 



433 



493 



IYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 

III MM I II IhMI MMI MM III MM INI MM Ml MINIMUM Mill 

IYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 4 93 



524 



The complete length ORF81ng nucleotide sequence (SEQ ID NO: 309) is: 



1 ATG AAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCATCCC TACTTACCGC 

51 CAGCGAAATC GCCTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 

101 CAAAAATGGC GGAAACGTTT GCGCTGACAT TTATGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGC TTCGCGGCTG CTGATTGCGG TGTTTTTCGC 

2 01 GTTCAGCATG ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 
251 TGACGGGTAT TAACTATTGG CTGATGCTGA AAGAGGTTAC CGAAGTCGGC 
301 AGCGCGGGCG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CTTTGTGGGG 

3 51 CGTGGCGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 
401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 
451 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 
501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGGC 
551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATCCC TGTGTTCAAA 
601 CAGCCTGCTC CAAGCAAAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 
651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGTTACG 
701 GGCGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 
751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 
801 GCCCAGTTTC TTTAACGTCA TACCGCACGC CAACGGCTTG GAACAAATCA 
851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA • 
901 ACGTATTTTT ACAGTGCCCA GGCTGAAAAC CAAATGGCAA TTTTGAACTT 
951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

1051 AAAATCAATT TGCAGCAGGG CAGGCATTTT ATCGTGTTGC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTG CGCCAAGATA 

1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATATTGT GCCTCTGGTT 
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1351 TTGTACAGCC CGGATAAGGC 

14 01 GCCTTGCGAG ATTGCCTTCC 

1451 CGTTGGGCTA CGATATGCCG 

1501 GGCAACCTGA TTACGGGCGA 

5 1551 GGCGGAATAT GTTTATCCGC 



CGTGCAACAG GCTGCCAACC AGGCTTTTGC 
ATCAGCAGCT TTCAACGTTC CTGATTCACA 
GTTTCAGGTT GTCGCGAAGG CTCGGTAACA 
TGCAGGCAGC TTGAACATTC GCAACGGCAA 
AATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 310): 



1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

10 101 SAGASMLDKL WLPALWGVAE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

15 351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 



20 ORF81ng (SEQ ID NO: 310) and ORF81-1 (SEQ ID NO: 306) show 96.4% identity in 524 aa 
overlap: 



10 20 30 40 50 60 

orf 81ng-l.pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
Mlh:: I | | | | | | | | | | | | | | | | | | | | | | J | | | : | | | | | | | | : | | | | | | | | | | | : : || 
25 orf 81-1 MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 81ng- 1 . pep LIAVFFAFSM I ANNVHYAVYQSVWTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 

I III II III: II II III II II II II II I II MINI II II I II II II II Nihil M I 

30 orf 81-1 LIAVFFAFSI IANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 81ng- 1 . pep VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

IMMIIIIIIIII llllllllllll IMMMMIM IMMIIIMMIIIIM 

35 orf 81 - 1 VML FCSLAKFRRKTHFSAD I LFAFLMLM I FVRSFDTKQEHG I SPKPTYSR IKANYFSFGY 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 81ng-l .pep FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPF 
I I I I I II I I I I I M MM I I M I II II M I I I I I Ml I I Ml I I I I II I I I I I II 
40 orf 81-1 FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 81ng-l.pep LTRLSQADFKP I VKQS YSAGFMTAVS LPS FFNVI PHANGLEQI SGGDTNMFRLAKEQGYE 

M I IM 1 1 M 1 1 1 1 1 1 1 1 M 1 1 M 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 

45 orf 81-1 LTRLSQ AD FKP I VKQS YSAGFMTAVS LPS FFNA I PHANGLEQI SGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



310 320 330 340. 350 360 

orf 81ng- 1 . pep TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 

I M I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I h I I 
50 orf 81-1 TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 
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310 320 330 340 350 360 

370 380 390 400 410 420 

orf 81ng- 1 . pep IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

1 1 1 1 1 1 M 1 1 1 1 1 1 1 i 1 1 ! 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 i I Ml 1 1 1 1 1 1 1 

5 orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 . 410 420 

430 440 450 460 470 480 

orf 81ng- 1 . pep AYTSDHGQYVRQDIYNQGTVQPDSYIVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

II II MINI MINI II II II: II Mill II II III II II. II II II II II II I 

10 orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 

490 500 510 520 

orf 81ng- 1 . pep L I HTLGYDMPVSGCREGS VTGNL I TGDAGSLN I RNGKAEYVYPQX 
I M I ■ II I I I I M I I I I M I I I I I I I I I I I I I M I I I I I I I M 
1 5 orf 81 - 1 L I HTLGYDMPVSGCREGS VTGNL I TGDAGSLN I RDGKAE YVYPQX 

490 500 510 520 

Furthermore, ORF81ng (SEQ ID NO: 310) shows significant homology to an E.coli OMP (SEQ ID 
NO: 1133): 

20 gi | 1256380 (U50906) outer membrane adherence protein-associated protein [E. coli] 

Length =54 7 
Score = 87.4 bits (213), Expect = 2e-16 

Identities = 122/468 (26%), Positives = 198/468 (42%), Gaps = 70/468 (14%) 

Query: 25 VFGIETLPAAKMAETFA- LTFMI AALYLFARYKAS - - RLLIAVFFAFSMIANNVHYAVYQ 81 
25 VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

Sbjct: 29 VFGITNLVASSGAHMVQRLLFFVLT I LWKRI SSLPLRLLVAAPFVL- LTAADMS ISLY - 86 

Query: 82 SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ P L A + L + 
Sbjct : 87 SWCTFGTTFNDGFAI SVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVI IKYDV 141 

30 Query: 135 HFSADILFAFLMLMIFVRSF DTKQEHGISPKPTYSRIKAN- -YFSFGYFVG 183 

+ L+L++ S D K ++ SP SR +F+ YF 

Sbjct: 142 SLPTKKVTGILLLIVISGSLFSACQFAYKDAKNKNAFSPYILASRFATYTPFFNLNYFAL 201 

Query: 184 RVLPYQ- -LFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPFL 241 
+Q L + +P F+ + I VLI+GES ++ L+GY R T+P + 

35 Sbjct: 202 AAKEHQRLLS I ANTVPYFQL SVRDTGIDTYVLIVGESVRVDNMSLYGYTRSTTPQV 257 

Query: 242 TRLSQADFKPIVKQSYSAGFMTAVSLP SFFNVIPHANGLEQISGGDTNMFRLAKEQG 298 

+Q + Q+ S TA+S+P + +V+ H I N+ +A + G 

Sbjct: 258 E - - AQRKQ I KLFNQAI SGAPYTALSVPLSLTADS VLSH DIHNYPDNI INMANQAG 310 

Query: 299 YETYFYSAQA ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

40 ++T++ S+Q+ +N A+ + + ++ + Y G DE LLP + Q 

Sbjct: 311 FQTFWLSSQSAFRQNGTAVTSI AMRAMETVYVRGF DELLLPHLSQALQQ 359 

Query: 356 - -QGRHFIVLHQRGSHAPYGALLQPQDKVFGEADIVDK- YDNTIHKTDQMIQTVFEQLQK 412 

Q + IVLH GSH P + VF D D YDN+IH TD ++ VFE L+ 

Sbjct: 360 NTQQKKLIVLHLNGSHEPACSAYPQSSAVFQPQDDQDACYDNSIHYTDSLLGQVFELLK- 418 

45 Query: 413 QPDGNWLFAYTSDHG QYVRQDIYNQG- -TVQPDSYIVPL-VLYSP 454 
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D Y +DHG ++++Y G +Y VP+ + YSP 

Sbjct: 419 --DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 4 64 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 311): 

1 ... ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC . GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

401 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

451 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence (SEQ ID NO: 312; ORF83): 



1 . . TLLLFIPLVL TXCGTLTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 

51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 

101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 

151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 313): 



1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

401 ACAGCGGACG CAAAGGCGAA" CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 
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851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 
901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 314; ORF83-1): 

5 1 MKTLLLLIPL VLTA CGTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

10 251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

3 01 DVGNEVIRRR KGG* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.meninsitidis (strain A) 

15 ORF83 (SEQ ID NO: 312) shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) 
(SEQ ID NO: 316) from strain A of N. meningitidis: 

10 20 30 40 50 

orf 83 . pep TLLLFIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 

. Ill :|||||| lllllll I I M I I I Ml I I II I I II I I I I II I I I I I I I I I I I 
20 orf 83a MKTLLXLIPLVLTA CGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

60 70 80 90 100 110 

orf 83 . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

II 1 1 1 1 1 1 1 1 M II 1 1 1 1 i II 1 1 Ml 1 1 II II II 1 II 1 1 " Mill 1 1 1 1 Mil II 

25 orf 83a YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

70 80 90 100 110 120 

120 130 140 150 160 170 

orf 83 . pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

II lllllll Mill MM III Mill MM II II Ml II I III III II II II II II III 

30 orf 83a TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

180 190 
orf 83 . pep IEWPPXYADTDVFVTVDV 
Mill I I I I I I I I I I II 

35 orf 83a IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence (SEQ ID NO: 315) is: 

1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

40 51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

2 51 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

45 301 CCCGGCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 
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4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

5 601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

10 851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA ■ 

This encodes a protein having amino acid sequence (SEQ ID NO: 316): 



1 MKTLLXLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

15 51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

2 51 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

20 301 DVGNEVIRRR KGG* 

ORF83a (SEQ ID NO: 316) and ORF83-1 (SEQ ID NO: 314) show 98.4% identity in 313 aa 
overlap: 



10 20 30 40 50 60 

25 orf 83a . pep MKTLLXLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

Mill II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 ! II 1 1 1 1 1 1 1 i 1 1 1 1 1 M M 1 1 : 1 

orf 83 - 1 MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 83a . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

M 1 1 1 1 1 Ml 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 M 1 1 II 1 1 U 1 1 h 1 1 

orf 83 - 1 YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

70 80 90 100 110 120 



35 



130 140 150 160 170 180 

orf 83a . pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVS FLTNLIQTVFYLRG 

1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 83 - 1 TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVS FLTNLIQTVFYLRG 

130 140 150 160 170 180 



40 



190 200 210 220 230 240 

orf 83a . pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

IMIMIMIMI MM III ill MMIIMMMMMIMMMIMIMM MMI 

orf 83 - 1 IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

190 200 210 220 230 240 



45 



250 260 270 280 290 300 

orf 83a . pep TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : M 1 1 1 1 1 II 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 M II 

orf 83 - 1 TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
250 260 270 280 290 300 



310 

50 orf 83a. pep DVGNEV I RRRKGGX 
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IIIIIIIIIMIII 

orf 83 - 1 DVGNEVIRRRKGGX 

310 

Homology with a predicted ORF from N. gonorrhoeae 

5 ORF83 (SEQ ID NO: 312) shows 94.9% identity over a 197aa overlap with a predicted ORF 
(ORF83.ng) (SEQ ID NO: 318) from N. gonorrhoeae: 

orf 83 .pep TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 58 

Illhllllll lllllll I I I ! I I I I I I I I I I I U I I I I I I I I I I I I I I I I I I 
orf 8 3ng MKTLLLLI PLVLTACGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDLSALKGRKAAL 6 0 

10 orf 83 .pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 118 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : II I : I II I I I I I I I I II I I I I I = I I I I 
orf 83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 12 0 

orf 83 .pep TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVS FLTNLIQTVFYLRG 178 
Illllllll I I I : I I I I I I I I M I i I I I I I I I I I I I I i I M I I I I I I ' II I I I M I I I 
15 orf83ng TSLLNAPAAALTKNNGRKGERS AGLS VNGTGDYRNETLLANPRDVS FLTNLIQTVFYLRG 180 

orf 83. pep IEWPPXYADTDVFVTVDV 197 

MINI IIIIIIIIIIII 

or f 8 3 ng I EWPPE YADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLI APK 240 

20 The complete length ORF83ng nucleotide sequence (SEQ ID NO: 317) is: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTACTCACCG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AGGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

25 201 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCCATC GACGCACTGA 

251 TACGCGGCGG CTACCACAAC AACCCCGACA GCGCCACCCG ATACAGCTAC 

3 01. CCCGCCTATG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCGGCGT 
351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAACGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 
30 4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

35 701 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

8 01 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 



40 



This encodes a protein having amino acid sequence (SEQ ID NO: 318): 



1 MKTL LLLIPL VLTAC GTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

45 151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 
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301 DVGNEVIRRR KGG* 

ORF83ng (SEQ ID NO: 318) and ORF83-1 (SEQ ID NO: 314) show 97.1% identity in 313 aa 
overlap 

5 10 20 30 40 50 60 

orf 83 - 1 . pep MKTLLLLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

I M I 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 . 1 1 1 II 1 1 1 1 M Ml II 1 1 II 1 1 M 1 1 1 1 II 1 1 1 1 1 1 

orf 83ng MKTLLLLI PLVLTACGTLTGI PAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 

10 20 30 40 50 60 

10 70 80 90 100 110 120 

orf 83 - 1 . pep YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

M IIIMIIIIIIIIIIIIIII IIIMIIMIMIIIIIMI I IMIIMIM 

orf 83ng YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 

70 80 90 100 110 120 

15 130 140 150 160 170 180 

orf 83 - 1 . pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

II II MINI IIMI MM Ml MINIMUM INI MM M MINIM IN 
orf 83ng TS LLNAPAAALTKNNGRKGERS AGLS VNGTGDYRNETLLANPRDVS FLTNL IQTVF YLRG 

130 140 150 160 170 180 

20 190 200 210 220 230 240 

orf 83 - 1 . pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I I i 1 1 1 1 1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 M 

orf 83ng IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

25 250 260 270 280 290 300 

orf 83 - 1 . pep TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I I I I I I I I I I I I I hhll I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I II M 
orf83ng TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 
250 260 270 280 290 300 

30 3io 

orf 83-1. pep DVGNE VI RRRKGGX 

lllllllllll 
or f 8 3 ng DVGNE VI RRRKGGX 

- 310 

35 

Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A 
(P-loop) in the gonococcal protein (double-underlined) and a putative prokaryotic membrane 
lipoprotein lipid attachment site (single-underlined), it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
40 diagnostics, or for raising antibodies. 



Example 38 
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The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 319): 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

5 101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

10 3 51 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

15 601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

701 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

20 851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 

951 gaAAGAAGTG ACGGaGTTGA TGTGccaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCG£CAAG TTGCCACATT GGGCGGAAAA 

25 1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence (SEQ ID NO: 320; ORF84): 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

30 51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

35 301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMYDNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 321): 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

40 51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

45 301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

401 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

50 551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 
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CTTCCGGATA 
AGATATGTTT 
851 ATAACGGTGT 
901 GAAGGCGGAA 
951 GAAAGAAGTG 
1001 CGTTTAACCC 
1051 GCGCAGCAAC 
1101 GTAGCAGAAC 
1151 AAGGAATCGG 



751 
801 



AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACGGAGTTGA 
ATACAAAGAA 
ATTCGGACAG 
CTAATGTACG 
CGGGGGCGTG 



CGAGCCGGTA 
TGTCCGAAAA 
AGAACCTTTG 
CGCCTGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
GGCGCAAGTT 
ATAATTGGGA 
GTCGGATCGG 



AATAACGGCA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
CTATGTAAAA 
GGCAGGAAGT 
GCCACATTGG 
AGAACGCGGG 
CAAACTGA 



ACCTTACCGC 
AAGCCGATTT 
AGGCTGTATA 
GGACGGCATT 
AACGGCTTGC 
TCAGCAAAGC 
GCGGAAAACC 
AAACCGTTTG 



This corresponds to the amino acid sequence (SEQ ID NO: 322; ORF84-1): 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 



MAEICLITGT 
HTYIETDAKK 
SAGSKIPENV 
KMGMRTLLEW 
KRSKWFYTLP 



PGSGKTLKMV 
LPKSTDEQLS 
QWLNTHRHQG 
KICADDPVKM 
VIVLLIPVFV 



LPDKTEGEPV 
EGGRTGCACY 
AQQHSDRAQV 



NNGNLTADMF 
SHQGTALKEV 
ATLGGKP*QN 



SMMANDEMFK 
AHDMYEWIKK 
IDIFVLTQGP 
ASSAFSSIYT 
GLSYKMLSSY 
VPTLSEKPES 
TELMCKDYVK 
LMYDNWEERG 



PDENGIRRKV 
PENIGSIVIV 
KLLDQNLRTL 
LDKKVYDLYE 
GKKQEEPAAQ 
KPIYNGVRQV 
NGLPFNPYKE 
KPFEGIGGGV 



FTNIKGLKIP 
DEAQDVWPAR 
VRKHYHIASN 
SAEVHTVNKV 
ESAATEQQAV 
RTFEYIAGCI 
ESQGQEVQQS 
VGSAN* 



Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF84 (SEQ ID NO: 320) shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) 
(SEQ ID NO: 324) from strain A of N. meningitidis: 



25 



10 20 30 40 50 60 

orf 84 .pep MAE I CL I TGTPGSGKTLKMVSMMANDEMFKPDEKA I RRKVFTN I KGLKI PHTY I ETDAKK 

MMMM MIMMMIMMIIIIMMI -MIMMMIM IIMMIMI I 

or f 8 4 a MAE I CL I TGT PGSGKTL KMVSMMANDEM FKPDENG I RRKVFTN I KGLKI PHTY I ETDAKK 

10 20 30 40 50 60 



30 



70 80 90 100 110 120 

orf 84 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I II 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 II 1 1 1 II 

orf 84a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 



35 



130 140 150 160 170 180 

orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

1 1 1 1 1 1 1 IMIMM IIMMMIMMIMMIIMIII II II MMM M 

orf 84a ID I FVLTQGS KLLDQNLRTLVRKHYH I ASNKMGMRTLLEWKI CADDPVKMASS AFS S I YT 

130 140 150 160 170 180 



40 



190 200 210 220 230 240 

orf 84 . pep LDKKVYDLYXXAEVHTVNKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 

MINIMI M 1 1 M II 1 1 1 1 II 1 1 1 1 MM 1 1 M M 1 1 II I M 1 1 1 1 1 1 1 M IM 

orf 84a LDKKVYDLYESAEVHTVNKVKRSKW FYTLPVI ILLI PVFVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



45 



250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 

1 1 1 1 1 1 : 1 1 1 : MMIIIIMIMIIIII MUM 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II I h 
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orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 84 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

5 Illlllhllllllllllhl: | | | ||:: | | | M I I I I I I I I I I hllll II 

orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

310 320 330 340 ' 350 360 

370 380 390 

orf 84 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

10 Illllll lllllllhllllllllllllMIIIII 

orf 84a ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 

370 380 390 

The complete length ORF84a nucleotide sequence (SEQ ID NO: 323) is: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

15 51 AAAAATGGTT TCCATGATGG CAAACGATGA" AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

2 01 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

2 51 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 
20 301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

3 51 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 

4 01 ATCAAAATCT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 
4 51 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

25 551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AATGGTTTTA TACTCTGCCA GTAATAATAT TGCTGATTCC 

651 CGTTTTTGTC GGCCTGTCCT ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA TCAGGCAGTA 

751 TTTCAGGATA AAACAGAAGG CGAGCCGGTA AACAACGGTA ACCTTACCGC 

30 801 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTGTA 

901 GAAGGCGGAA GAACCGGATG CACATGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAATT ACAAAGGAAA TGTGCAAGGA TTACGCAAGA AACGGATTGC 

1001 CGTTTAACCC ATATAAAGAA GAAAGCCAAG GGCGGGATGT CCAGCAAAGT 

35 1051 GAGCAGCACC ATTCGGACAG ACCGCAAGTT GCCACGTTGG GCGGAAAGCC 

1101 GTGGCAAAAT CTTATGTATG ATAATTGGCA GGAGCGCGGA AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 324): 

40 1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYH I ASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

2 01 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

45 2 51 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 1 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGSAN* 



50 



ORF84a (SEQ ID NO: 324) and ORF84-1 (SEQ ID NO: 322) show 95.2% identity in 395 aa 
overlap: 
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10 20 30 40 50 60 

orf 84a. pep MAE I CL I TGTPGSGKTLKMVSMMANDEMFKPDENG I RRKVFTNI KGLKI PHTY I ETDAKK 

1 1 M 1 1 1 1 1 1 - 1 1 1 1 1 1 II i 1 1 II 1 1 1 M 1 1 1 1 1 II I M 1 1 1 1 1 1 1 1 N 1 1 1 1 1 1 1 1 1 

orf 84 - 1 MAE I CL I TGTPGSGKTLKMVSMMANDEMFKPDENG I RRKVFTN I KGLKI PHTY I ETDAKK 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 84a . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I I I I I II I I ' I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
orf 84 - 1 LPKSTDEQLSAHDMYEWI KKPENIGS I VIVDEAQDVWPARSAGSKI PENVQWLNTHRHQG 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 84a . pep IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

MINIMI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 84 - 1 I D I FVLTQGPKLLDQNLRTLVRKH YH I ASNKMGMRTLLEWKI CADDPVKMAS S AFSS I YT 

130 , 140 150 160 170 180 



20 



190 200 210 220 230 . 240 

orf 84a. pep LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I E I I I I I I I I I I I I 
orf 84 - 1 LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 



25 



250 260 270. 280 290 .300 

orf 84a . pep ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 

I I I I I I : I I I : I I I I I t I I t I I I M I I I I I I I I 1 I I !! 1 1 I I I I I I I 1 1 t I 1 I I 1 I I I = 
orf 84-1 ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 



30 



310 320 330 340 350 360 

orf 84a . pep EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 

I I I I ! I M II I I I I I I I M : I I I I I I I I I I I I I I I I I I I I I I I hllll M 
or f 84 - 1 EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 



35 



370 380 390 

orf 84a . pep ATLGGKPWQNLMYDNWQERGKP FEG IGGGWGS ANX 

I I I i I It I I I t I I I I = I I I I I I I I I I I I I I I I I I I 
orf 84 - 1 ATLGGKPXQNLMYDNWEERGKP FEG IGGGWGS ANX 

370 380 390 



Homology with a predicted ORF from N. gonorrhoeae 

ORF84 (SEQ ID NO: 320) shows 94.2% identity over a 395aa overlap with a predicted ORF 
(ORF84.ng) (SEQ ID NO: 326) from N. gonorrhoeae: 



40 



orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf 84ng 



MAE I CL I TGTPGSGKTLKMVSMMANDEMFKPDEKAI RRKVFTNI KGLKI PHTY I ETDAKK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 : : : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 ! 1 1 1 

MAE I CL I TGT PGSGKTL KMVSMMANDEM F KPDENGVRRKVFTN I KGLKI PHTH I ETDAKK 



60 



60 



120 



LPKSTDEQLSAHDMYEWI KKPENIGS I VIVDEAQDVWPARSAGSKI PENVQWLNTHRHQG 

I I I I I I I I I I I I I I I I I I I I I I I = I = I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LPKSTDEQLSAHDMYEW I KKPENVGAI VIVDEAQDVWPARSAGSKI PENVQWLNTHRHQG 120 
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orf 84 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 180 

I I I I I I I I I I I I I I I I I I I- I Ml : I I I I : I M II h I I I M I i I I I M I I I II 
orf 84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 180 

orf 84 .pep LDKKVYDLYXXAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 240 

5 Illllllll I :| I I I I I I I I Lhl I I hll hi I I I I I I I I M I I M I I M I 

orf 84ng LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 240 

orf 84 .pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 3 00 

I M I I I I I I I II I M I M II I I M M I I I I I III I I I I ;l I I I I I I I I I I I II I I 
orf 84ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 300 

10 orf 84 .pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ; ii 1 1 1 M 1 1 1 1 1 1 1 1 1 1 m 1 1 1 1 1 1 1 1 m 1 1 1 1 

or f 8 4 ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNP YKEESQGQEVQQSAQQHSDRAQV 360 

orf 84 . pep ATLGGKPXQNLM YDNWEERGKPFEG I GGGWGS AN 3 95 

lllllll Mlllll IIIIIIIMIIIIII 
15 orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 

The complete length ORF84ng nucleotide sequence (SEQ ID NO: 325) is: 

1 ATGGCAGAAA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCAGATGAAA 

20 101 ACGGCGTACG CCGTAAAGTA TTTACGAACA TCAAAGGTTT GAAGATACCG 

151 CACACCCACA TAGAAACAGA CGCAAAGAAG CTGCCGAAAT CAACCGATGA 

201 ACAGCTTTCG GCGCATGATA TGTATGAATG GATCAAGAAG CCTGAAAacg 

251 tcggcgCAAT CGTTATTGTC GATGAGGCGC AAGACGTATG GCCCGCACGC 

3 01 TccgCAGGTT CGAAAATCCC CGAAAACGTC CAATGGCTGA ACACACACAG 
25 351 GCATCAGGGC ATAGATATAT TTGTATTGAC ACAAGGTCCT AAACTCTTAG 

401 ATCAGAACTT GCGAACATTG GTTAAAAGAC ATTACCACAT TGCGGCCAAC 

4 51 AAAATGGGTT TGCGTACCCT GCTTGAATGG AAAGTATGCG CGGATGACCC 
501 GGTAAAAATG GCATCAAGTG CATTTTCCAG TATCTACACA CTGGATAAAA 
551 AAGTTTATGA CTTGTACGAA TCCGCAGAAA TTCACACGGT AAACAAAGTC 

30 601 AAGCGTTCAA AATGGTTTTA TGCATTGCCC GTCATCATAT TATTGATTCC 

651 GCTATTTGTC GGTTTGTCTT ACAAAATGTT GGGCAGTTAC GGAAAAAAAC 

701 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

751 CTTCCGGATA AAACAGAAGG AGAATCGGTG AATAACGGAA ACCTTACGGC 

801 AGATATGTTT GTTCCGACAT TGCCCGAAAA ACCCGAAAGC AAGCCGATTT 

35 851 ATAACGGTGT AAGGCAGGTA AGGACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CACCTGCTAT TCGCATCAAG GGACGGCATT 

951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACCTTGG GCGGAAAACC 

40 1101 GCAGCAGAAC CTAATGTACG ACAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 326): 



1 MAEICLIT GT PGSGKT LKMV SMMANDEMFK PDENGVRRKV FTNIKGLKIP 

51 HTHIETDAKK LPKSTDEQLS AHDMYEWIKK PENVGAIVIV DEAQDVWPAR 

45 101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE SAEIHTVNKV 

201 KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

50 351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 
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ORF84ng (SEQ ID NO: 326) and ORF84-1 (SEQ ID NO: 322) show 95.4% identity in 395 aa 
overlap: 

10 20 30 40 50 60 

orf 84-1. pep MAEI CLI TGTPGSGKTLKMVSMMANDEMFKPDENGI RRKVFTNI KGLKI PHT Y I ETDAKK 

1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M < 1 1 1 1 1 1 1 1 1 1 hi 1 1 1 1 M 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 

orf 84ng MAE I CL I TGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTN I KGLKI PHTH I ETDAKK 

10. 20 30 40 50 60 



10 



15 



70 80 90 100 110 120 

orf 84 - 1 . pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

II I I I I I I I I .1 I I Ml I I I I I I : I: I M I I i I I I I I I I I I I I I I I I I I . I I I I I I i I M 
orf 84ng LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 84 - 1 . pep I D I FVLTQGPKLLDQNLRTLVRKHYH I ASNKMGMRTLLEWKI CADDP VKMAS S AFS S I YT 
I I I I I I I I I I I I M I I I M h : I I I I hill h I I I I I h I I I I I I I I II I I I I I I I I 
orf 84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf 84 - 1 . pep LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

Ihlllhl Ihllhlh lllllhllhl Ihlllllhllhlhllllllll 
orf 84ng LDKKVYDLYESAEIHTVNKVKRSKWFYALPVIILLIPLFVGLSYKMLGSYGKKQEEPAAQ 

190 200 210 220 230 240 



25 



250 260 270 280 290 300 

orf 84 - 1 . pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

MM II MINIMI II I Ml II II Mill Ml I M I M I II II II M I M I M 1 1 M 

orf 84ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 



30 



310 320 330 340 350 360 

orf 84 - 1 . pep EGGRTGCACYSHQGfALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

- I I II I h I I I I I I I I I I I I I I I I h I M I I I I II I I I I II I I I I I h I II I II I I II 
orf 84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 



370 380 390 

or f 84 - 1 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

35 M II II I I II I II I I I II I I I II I I M II M I I I I 

orf84ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSANX 

370 380 390 

Based on this analysis, includng the presence* of a putative transmembrane domain (single- 
40 underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 
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The following partial DNA sequence was identified in N .meningitidis (SEQ ID NO: 327): 



1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 

51 TGAAGTCAAA CTGAAAAAAT TCCATATCGA TTTTTACAAT ACGGGTATGC 

101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 

151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTTCGGAT TTGACATTCA 

2 51 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 
301 ACATCCATAC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

3 51 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

4 01 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

4 51 ACTCAGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC . 

601 TTGCAGCAGC AATACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651' AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

701 GCAAACGTCT . GTTGCCGAC GCAACCAAAG GCGCACCTGC CGAAATCCGC 

751 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C . GGTCCGCT 

1101 TTTGGTCTAT CTC . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 328; ORF88): 



1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

3 01 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

3 51 SEVRSSGLQM TRSXGPLLVY L . . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 329): 



1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

2 01 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 
251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

3 01 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

3 51 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

4 01 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 
4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 
501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 
551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 
601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 
651 GGGTG.CGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 
701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 
751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 
801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 
851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
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901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2 001 CTTGAATCAT GACTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 330; ORF88-1): 



1 MSKSRRSPPL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG Y I F AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

3 01 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 
351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 
451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 
501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PGA LLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N.meningitidis (strain A) 

ORF88 (SEQ ID NO: 328) shows 95.7% identity over a 371aa overlap with an ORF 
(SEQ ID NO: 332) from strain A of N. meningitidis: 



orf 88 .pep 



10 20 30 

MVFLNADNGILVQDLPFEVKLKKFHIDFYN 



orf 88a 




210 220 230 240 250 260 



40 



50 



60 



70 



80 



90 
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orf 88 .pep TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

1 1 1 i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 ! I ! 1 1 1 1 j 1 1 1 

orf 88a TGMPRDFASD I EVTDKATGEKLERT I RVNHPLTLHG I T I YQAS FADGGSDLTFKAWNLGD 

270 280 290 300 310 320 



100 110 120 130 140 150 

orf 88 . pep ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 

1 1 1 II M II 1 1 1 1 II 1 1 1 1 1 1 Ml M III I II II I II 1 1 1 1 II M M II 1 1 II Mill 

orf 88a ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 
330 340 350 360 370 380 



160 170 180 190 200 210 

or f 8 8 . pep TQEGHKYTNXXXXXXYR I RDAPGQAVE YKNYMLP VLQEQDYFW I TGTRSXLQQQ YRWLR I 

1 1 1 : ; 1 1 1 MINI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M llllllll 

orf 88a TQEGKKYTN I G PS I VYR I RDAAGQ AVE YKNYMLP VLQEQDYFW I TGTRSGLQQQ YRWLR I 

390 400 410 420 430 440 



220 230 240 250 260 270 

orf 88 . pep PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 

' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1' 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 . 

orf 88a PLDKQLKADTFMALREFLKDGEGRKRLVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 
450 460 470 480 490 500 



280 290 300 310 320 330 

orf 88 . pep GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 

I I I I I I I I I I I I II I I I I II II I I I I I I I I I II lllllllllllllllllllll 

orf 88a GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 



340 350 360 370 

orf 88 . pep DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 

1 1 1 1 1 1 1 M 1 1 ; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I Mill 

orf 88a DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSPG ALLVYLGSVLLVLGTVLM FYVREKR 
570 580 590 600 610 620 

orf 88a AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 

The complete length ORF88a nucleotide sequence (SEQ ID NO: 331) is: 



1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

2 01 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

2 51 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 
301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

3 51 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

4 01 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 
4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 
501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 
551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 
601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 
6 51 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 
701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 
751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 
801 TACGGGTATG CCGCGCGATT TTGCCAGTGA TATTGAAGTA ACGGATAAGG 
851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 
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951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACATCCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTTACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

5 1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

10 1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

15 1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

1701 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

20 1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2 001 CTTGAATCAT . GACTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 332): 

25 1 MSKSRRSPPL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKS FREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

30 251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

35 501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

40 ORF88a (SEQ ID NO: 332) and ORF88-1 (SEQ ID NO: 330) 100.0% identity in 671 aa overlap: 

or f 88a . pep MS KSRRS P PLLSRPWFAFFSSMRFAVALLS LLG I AS VI GTVLQQNQPQTDYLVKFGS FWA 60 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I , I I I I I I II II I I I II I II I I I 
orf 88- 1 MS KS RRSP PLLSRPWFAFFSSMRFAVALLS LLG I ASV I GTVLQQNQPQTDYLVKFGS FWA 60 

orf 88a .pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

45 1 1 1 1 1 1 I 1 1 1 1 I 1 1 I I I 1 1 I I I 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 I 1 1 I II 1 1 1 1 1 1 II II 1 1 1 1 II 

orf88-l QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKS FREKVKEKSLAAMRH 120 

orf 88a. pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

II MINIMI Mill Mill I II Mill Mill llllllllllllll MINIM II II 
orf 88-1 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

50 orf 88a .pep GGL I DSNLLLKLGMLTGR I VPDNQAVYAKDFKPES I LGASNLS FRGNVNI S EGQS ADWF 240 

I II 1 1 1 II II II IM I II I II II III II II II 1 1 1 1 II II II 1 1 III II II I II I 

orf 88-1 GGL I DSNLLLKLGMLTGRIVPDNQAVYAKDFKPES I LGASNLS FRGNVNI SEGQSADWF 24 0 
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orf 88a. pep LNADNG I LVQDLP FEVKLKKFH IDF YNTGMPRDFASD I EVTDKATGEKLERT IRVNHPLT 300 

IMIIMIIMII IIIMIMI MIMIIill IIIIIIIIIIIMIIMIIIMIIIIM 

orf 88-1 LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf 88a . pep LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

5 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 

orf88-l LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf 88a. pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

1 1 1 1 1 M 1 1 J 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 i 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1 M 1 1 

orf88-l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

10 orf 88a. pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II I M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 8 8 - 1 P VLQEQD YFW I TGTRSGLQQQ YRWLRI PLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

orf 88a .pep GAPAE I REQFMLAAENTLNI FAQKGYLGLDEF I TSN I PKEQQDKMQGYF YEMLYGVMNAA 540 

II I MM Ml II II II 1 1 1 1 1 1 II M Ml II 1 1 MM II 1 1 II II 1 1 M I 1 1 1 1 ill I 

] 5 orf 88 - 1 GAPAE I REQFMLAAENTLNI FAQKGYLGLDE F I TSN I PKEQQDKMQGYF YEMLYGVMNAA 54 0 

orf 88a .pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I ■ 1 1 1 II 1 1 M I M 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 88-1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 88a. pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

20 M I MM II MM II II II lllllll Mill II III II II MM II III 1 1 II II MM I 

orf 88- 1 PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 88a. pep LQRLGKDLNHD 672 

MIMIIill 
orf88-l LQRLGKDLNHD 6 72 

25 Homology with a predicted ORF from N. gonorrhoeae 

ORF88 (SEQ ID NO: 328) shows 93.8% identity over a 371aa overlap with a predicted ORF 
(ORF88.ng) (SEQ ID NO: 334) from N. gonorrhoeae: 

orf 88 . pep MVFLNADNG I LVQDLPFEVKLKKFH I DFYNTGMPRDFASD I EVTDKATGEKLERT I RVNH 60 

MMMMMMMMIMM IIIMIIIMIIIIIIIMIII I IMIMIMM 

30 orf 88ng MVFLNADNGMLVQDLP FEVKLKKFH I DFYNTGMPRDFASD I EVTDKATGEKLERT I RVNH 60 

orf 88 .pep PLTLHGITIYQAS FADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

MIMIMMIMMMIMI Mill I M 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 

orf 88ng PLTLHGITIYQASFADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFD 120 

orf 88 .pep QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 180 

35 MINI MM III MM Mill II Mill 1 1 MM III MUM III II II I 

or f 8 8ng QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPS I VYR I RDAAGQAVE YKN 180 

■ orf 88 .pep YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 24 0 

I I I I : I I : : I I I I : I i I I I I I M II I I I II II I I I I I I I I I I I II I I I II II I M Ml 
or f 8 8 ng YMLPI LQDKDYFWLTGTRSGLQQQ YRWLRI PLDKQLKADTFMALREFLKDGEGRKRLVAD 24 0 

40 orf 88 .pep ATKGAPAE I REQFMLAAENTLNI FAQKGYLGLDEF I TSN I PKEQQDKMQGYFYEMLYGVM 300 

Ml 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M II 1 1 1 1 1 II I M 1 1 1 1 1 1 1 II II I II I M II II 1 1 1 

orf 88ng ATKDAPAE I REQFMLAAENTLNI FAQKGYLGLDEF I TSN I PKGQQDKMQGYFYEMLYGVM 300 
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or f 8 8 . pep NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 360 

Illllll I II I II M I I I I I I I I I I M I I I I I II I I i I M I 1 I II I I I I II I I I I I M 
or f 8 8ng NAALDET I RRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFS EVRS SGLQM 360 

or f 8 8 . pep TRSXGPLLVYL 371 

5 Ml I Mill 

orf 88ng TRSPGALLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 42 0 

An ORF88ng nucleotide sequence (SEQ ID NO: 333) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 334): 

10 1 MVFLNADNGM LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLRD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLNDVRAV 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN YMLPILQDKD YFWLTGTRSG 

2 01 LQQQYRWLRI PLDKQLKADT FMALREFLKD GEGRKRLVAD ATKDAPAEIR 
15 251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKGQQDKMQG YFYEMLYGVM 

301 NAALDET I RR YGLPEWQQDE ARNRFLLHSM DAYTGLTEYP APMLLQLDGF 

3 51 S EVRS SGLQM TRSPGA LLVY LGSVLLVLGT VFM FYVPKKR AWVLFSNXKI 
401 RFAMSSARSE RDLQKEFPKH VESLQRLGKD LNHD* 

20 Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 335): 



1 ATGAGTAAAT CCCGTATATC TCCCACACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGCACG GTGTTACAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGACC GTTTTGGACT CGGATTTTTG ATTTTTTGGG 

25 201 TTTGTATGAT GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTC 

251 TGGTGGTTTC TACCAGTTTG TGTTTAATCC GTAACGTTCC GCCGTTTTGG 

301 CGCGAAATGA AGTCTTTCCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCCCCC GAAGTTGCCA 

4 01 AACGTTATCT GGAGGTGCGG GGTTTTCAGG GAAAAACCGT CAGCCGTGAG 

30 4 51 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCAcaatga acaaATGGGG 

501 CTATATCTTT GCccaagtag ctTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGCTG AAGCTGGGTA TGCTGGCCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651- GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

35 701 AAAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT GTTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGCGA TATTGAAGTA ACGGACAAGG 

851 CAACCGGTGA GAAACTCGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

40 951 TTTGACATTC AAGGCGTGGA ATTTGAGGGA TGCTTCGCGC GAACCTGTCG 

1001 TGTTGAAGGC AACCTCCATA CACCAGTTTC CGTTGGAAAT CGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGT GCGGAACGGG AAAAAAGCCT GAAATCCACT CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

45 1201 ATCGTGTACC GCATCCGTGA TGcggCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGATTTTGC AGGACAAAGA TTATTTTTGG CTGACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG' CGCAAACGTC TGGTTGCCGA CGCAACCAAA GACGCACCTG 

50 14 51 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAATATC 

1501 TTTGCGCAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGGG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAC CGTTTCCTGC TGCACAGTAT 
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1701 GGATGCCTAT ACGGGGCTGA CGGAATATCC CGCGCCTATG CTGCTCCAGC 

1751 TTGACGGGTT TTCCGAGGTG CGTTCCTCAG GTTTGCAGAT GACCCGTTCG 

1801 CCGGGTGCGC TTTTGGTCTA TCtcggctcg gtattgttgg TTTTGGgtac 

1851 ggtaTttatg tTTTATGTGC GCGAAAAACG GGCGTGGgta tTGTTTTCag 

5 1901 aCGGCAAAAT CCGTTTTGCT ATGtCTTcgg CCcgcagcga ACGGGATTTG 

1951 cAGAaggaaT TTCCAAAACA CGtcgAGAGC CTGCAACggc tcggcaaggA 

2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence (SEQ ID NO: 336; ORF88ng-l): 

10 1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKS FREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 

201 PDNQAVYAKD FKPESILGAS NLS FRGNVN I SEGQSADWF LNADNGMLVQ 

15 251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

3 01 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPWLKATSI HQFPLEIGKH 
351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 
401 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 
20 501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PGA LLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

25 ORF88ng-l (SEQ ID NO: 336) and ORF88-1 (SEQ ID NO: 330) show 97.0% identity in 671 aa 
overlap: 

orf 88-1 .pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGS FWA 60 

Mill II M 1 1 1 1 1 1 1 M M I U 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M I Ih 

orf 88ng-l MS KSRISPTLLSRPWFAFFSSMRFAVALLSLLGI AS VI GTVLQQNQPQTD YLVKFGPFWT 60 
30 orf 88-1 .pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

HI lllllllllllllllllll IIIIIIIIIIMiMIIIIMI I llllillllll 

orf 88ng-l RIFDFLGLYDVYASAWFVVIMMFLWSTSLCL I RNVPPFWREMKS FREKV KEKSLAAMRH 12 0 

orf 88-1 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 
I I I I M I I I I I I I I I I h I ' M I- I I I I I I I I I II I II I I I I i I I I I :| I I I ! I I 
35 orf 88ng-l SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICL 180 

orf 88-1 .pep GGL I DSNLLLKLGMLTGRIVPDNQAVYAKDFKPES I LGASNLS FRGNVNI SEGQSADWF -240 

I I I I I I I I I I I I : I I I I I I M I I I I I I I I I I I I I I I I I I I I h II I I I I I I I I I I I 
orf 88ng-l GGL I DSNLLLKLGMLAGRIVPDNQAVYAKDFKPES I LGASNLS FRGNVN I SEGQSADWF 24 0 

orf 88-1 .pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

40 | | | | | | : | | | || | | || | | | | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

orf 88ng- 1 LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

orf 88-1 .pep LHGI T I YQAS FADGGSDLTFKAWNLGDASREPWLKATS IHQFPLEIGKHKYRLEFDQFT 360 

M I i I I I I I I I I I I I I I I I I M I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I 
orf 88ng-l LHGI TI YQAS FADGGSDLTFKAWNLRDAS RE PWLKATS IHQFPLEIGKHKYRLEFDQFT 360 

45 orf 88-1 .pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

1 1 II I lllllllllllllllllll MM Mill I llllillllll II llllillllll II 

orf 88ng- 1 SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 
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orf 88-1 .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

hll- I 1 1 h I 1 1 II M I 1 1 M 1 1 : M I M 1 1 M 1 1 1 II I 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

or f 8 8ng- 1 P ILQDKDYFWLTGTRSGLQQQYRWLRI PLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

orf 88-1 .pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

5 I I I I I I II I I I I I I I I I I M I I I I I I I I I I I . I I II I I I'll II II II I I II I I 

orf 88ng-l DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVMNAA 54 0 

orf 88-1 .pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

- 1 1 1 1 1 1 1 : 1 II I Ml 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 M 1 1 1 

orf 88ng- 1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

10 orf 88-1 .pep PGALLWLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 66 0 

I I I I M I I I I I I I I I I I: I I I I I I ; I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
orf 88ng- 1 PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 88-1. pep LQRLGKDLNHD 671 

IlillMIMI 
15 orf88ng-l LQRLGKDLNHD 671 

Furthermore, ORG88ng-l (SEQ ID NO: 336) shows homology with a hypothetical protein (SEQ 
ID NO: 1 134) from Aquifex aeolicus: 

gi | 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length = 537 
20 Score = 94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps = 59/334 (17%) 

FAFFSSMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 
+ F +S++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 



25 



30 



40 



Query : 


16 


Sbjct : 


80 


Query: 


75 


Sbjct: 


140 


Query: 


135 


Sbjct : 


198 


Query : 


193 


Sbjct: 


250 


Query: 


253 


Sbjct : 


281 


Query: 


301 


Sbjct: 


338 



++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 



- RYLEVRGFQGKTVSREDGSVLI AAKKGTMNKWGYI FAQVALI VI CLGGLIDSNLLLKL 192 
+ +L +GF+ V E + + A+KG ++ G +AL+VI G LID 



+I+G RG++ ++EG + DV+ + A+ L 

-AIVGV RGSLIVAEGDTNDVMLVGAE- -QKPYKL 280 



EVKLKKFH IDFY - - -NTGMPRDFA SDIEVTDKATGEKLER- -TIRVNHPLT 300 

35 PF V L F I Y N + + FA SDIE+ + G K+E T++VN P 



++QA++ DG S + + + A +P 



Based on this analysis, including the putative transmembrane domain in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 40 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 337): 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAa TATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

3 51 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 
451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 338; ORF89): 



1 MMSNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 339): 



1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

2 01 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

2 51 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

3 01 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 
351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 
4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 



This corresponds to the amino acid sequence (SEQ ID NO: 340; ORF89-1): 



1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with PilE of N. gonorrhoeae (accession number Z69260) (SEP ID NO: 1 135). 



ORF89 (SEQ ID NO: 338) and PilE protein (SEQ ID NO: 1135) show 30% aa identity in 120a 



overlap: 
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orf 89 


8 


PilE 


5 


orf89 


67 


PilE 


65 



QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 
QKGFTLI MIV+AI+GI++ +A+P+Y Y+ S + G + ++L++ 

QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 



DN + +G + KI KY SV + GV K G LS+W 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF89 (SEQ ID NO: 338) shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) 
(SEQ ID NO: 342) from strain A of N. meningitidis: 



10 10 20 30 40 50 60 

orf 89. pep MMSNXMXQKGFTLIXXMIVVAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

1 1 1 1 I MINIMI II Ml MMMMMMMMI MINIM 

orf 89a MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 89 .pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

1 1 1 1 1 M I II I MM I i I M I M M 1 1 1 1 M Ml 1 1 M MM III Mlllhllll 

orf 89a ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

70 80 90 100 110 120 

20 130 140 150 160 

orf 8 9 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

Mill Illlllll MMMMMMMMI MIIIIMI 

orf 89a TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 



25 



The complete length ORF89a nucleotide sequence (SEQ ID NO: 341) is: 



1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGNGANGNT 

51 NATNGNCNTC GCGATACNCN GCNTTANCAG CGTCATTNCN ATNNNTNCNT 

101 ATCNNAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

30 151 GTCGGTATCA ACAATATTTC CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 

201 CGATAATCAG ACCATCAAGA GCAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC GAAAAATATA ATGTTTCGGT GCATTTTGTC 

301 AATGAGGAAA AACCNAGGGC ATACAGCTTG GTCGGCGTTC CAAAGACGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

35 401 AATGCCGTGA TGCCGCTTCT GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 342): 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 

40 51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 

101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 

151 DVGCEAFSNR KK* 

ORF89a (SEQ ID NO: 342) and ORF89-1 (SEQ ID NO: 340) show 83.3% identity in 162 aa 



45 overlap: 
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10 20 30 40 50 60 

orf 89a. pep ' MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

IIMIIIIMIIII II III I I I I I I I I I I I I II I I I I I I I I I I I 

orf 89-1 MMSNKMEQKGFTLIEMMIVVAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9a. pep ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 

I M I ' I I M I I - I M I I I I I U I I I I I ,:| M I M i-M III Mlllhllll 
orf 89-1 ILKNPLDDNQTIENKLEIF'VSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
10 70 80 90 100 110 120 

130 140 150 160 

orf 8 9a .pep TLSVWMNSVGDGYKCRDAASARAHLETLSSDVGCEAFSNRKKX 

III IIIIMIMIII MIMIIIIII MIMIMMII. 

orf 89-1 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
15 130 140 150 160 

Homology with a predicted ORF from N.sonorrhoeae 

ORF89 (SEQ ID NO: 338) shows 84.6% identity over a 162aa overlap with a predicted ORF 
(ORF89.ng) (SEQ ID NO: 344) from N. gonorrhoeae: 



orf 89 MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

20 MM I I II II II I II h 1 1 II I II 1 1 II II II II 1 1 1 1 1 M 1 1 M 1 1 1 h Ml 

orf 89ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

orf 8 9 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 12 0 

Mill IIM:- MMIUIMI IIIIIIMMI II MIMMIhlllM 

orf 89ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

25 orf89 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

1 1 1 1 1 M 1 1 1 1 1 1 ! 1 1 M 1 1 1 : Ml hi MUM II II 

or f 8 9ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKK 162 

The complete length ORF89ng nucleotide sequence (SEQ ID NO: 343) is: 

30 1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGATACTCG GCATCATCAG CGTCATTGCC ATACCTTCTT 

101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

2 01 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 
35 251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

301 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 

3 51 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 
451 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 



40 



This encodes a protein having amino acid sequence (SEQ ED NO: 344): 



1 MMSNKMEOKG FTLIEMMIW TILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 

45 151 DSGCEAFSNR KK* 
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This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng (SEQ ID NO: 344) and ORF89- 
1 (SEQ ID NO: 340) show 88.3% identity in 162 aa overlap: 

5 10 20 30 40 50 60 

orf 89-1 .pep MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
I I I M I I I I I I I I I .1 I I I I I I I I I I I I I I I I I I I I I I I i I I M M I ' h III 
orf 89ng ^4MSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 

10 20 30 40 50 60 

10 70 80 90 100 110 120 

or f 8 9 - 1 . pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

Mill I M - I M I I ! I I I I I I M I I ! M M M I I II MIIIMIhillll 
orf 8 9ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 

70 80 90 100 110 120 

15 130 140 150 160 

orf 89-1 .pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

Ml III 1 1 MM II I hlllh :|lhl MINI II III 

orf 8 9ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 

130 140 150 160 

20 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein (SEQ ID NO: 1135), it was predicted that these proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

25 ORF89-1 (SEQ ID NO: 340) (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as 
described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 11A shows the results of affinity purification of the GST-fusion protein. Purified 
GST-fusion protein was used to immunise mice, whose sera gave a positive result in the ELISA 
test., confirming that ORF89-1 (SEQ ED NO: 340) is a surface-exposed protein, and that it is a 

30 useful immunogen. 

Example 41 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 345): 



35 



i 

51 
101 



ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 
CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 
ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 
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151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 
2 01 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG.GCAAA 
251 AACAAGCGTT GGCCn . AGAA TTTCAACCC . . . 

5 This corresponds to the amino acid sequence (SEQ ID NO: 346; ORF91): 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 
51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP . . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 347): 

10 1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

2 01 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 
15 2 51 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

3 51 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 
451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

20 501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence (SEQ ID NO: 348; ORF91-1): 

1 MKKSSLISAL GIGILSIGMA FAA PADAVSQ IRQNATQVLS ILKNGDANTA 

25 51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 

30 Homology with a predicted ORF from N.meningitidis (strain A) 

ORF91 (SEQ ID NO: 346) shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) 
(SEQ ID NO: 350) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 91 . pep MKKS S L I S ALG I G I LS I GMAFAAPADAVS Q I RQNATQVL S I L KNGDANTARQKAEAYA I P 
35 | | | | | : | | | | | | | | | | | | | | | | | | | M I : I I I I I I I I I I I I I h I I I I I I I I I I I I I I I I 

orf 91a MKKS S F I SALG I GI LS IGMAFAAPADAVNQ I RQNATQVLS I LKSGDANTARQKAEAYAI P 

10 20 30 40 50 60 

70 80 90 

or f 9 1 . pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 

40 llllllllillMIII I II llllll III 

or f 9 la YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPI VN 

70 80 90 100 110 120 

orf 91a KGGKEI IVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEI IKAK 

130 140 150 160 170 180 
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The complete length ORF9 la nucleotide sequence (SEQ ID NO: 349) is: 



1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

5 101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGTGATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

10 351 CATCGTCAAT AAAGGCGGCA AAGAAATCAT CGTCCGCGCC GAAGTCGGCG 

401 TACCCGGGCA AAAACCCGTC AACATGGACT TCACCACCTA CCAAAGCGGC 

451 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 



15 



This encodes a protein haying amino acid sequence (SEQ ID NO: 350): 



1 MKKSSFISAL GIGILSIGMA FA APADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

20 151 GKYRTYNVAI EGASLVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 



ORF91a (SEQ ID NO: 350) and ORF91-1 (SEQ ID NO: 348) show 98.0% identity in 196 aa 
overlap: 



10 20 30 40 50 60 

25 orf 91a. pep MKKSSF I SALGIGILSIGMAFAAPADAVNQ IRQNATQVLS I LKSGDANTARQKAEAYAIP 

I I I hM I I I I I I I I I I M I M I I I M I I I I I I I I M I I : I I I I I I I I I II I I II 
orf 91 - 1 MKKS SL I SALGI GI LS IGMAFAAPADAVSQ IRQNATQVLS I LKNGDANTARQKAEAYA I P 

10 20 30 40 50 60 



70 80 90 100 110 120 

30 orf 91a . pep yfdfqrmtalavgnpwrtasdaqkqalakefqtllirtysgtmlklknanvnvkdnpivn 

1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 1 1 M I II I II II ! 1 1 1 

orf 91 - 1 YFDFQRMTALAVGNPWRTASDAQKQAl^KEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 

70 80 90 100 110 120 



130 140 150 160 170 180 

35 orf 91a. pep KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 

I I I I I I I I I I I I I I I I I I I I I M I I i I I I I I I I M I I I I I I I I I I I I I I I I I I I I II 
orf 91-1 KGGKEI IVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEI IKAK 

130 140 150 160 170 180 

190 

40 orf 9 la . pep GVDGLIAELKAKNGSKX 

IIIIIIIIIIMIhll 
orf 91-1 GVDGLIAELKAKNGGKX 

190 



Homology with a predicted ORF from N. gonorrhoeae 
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ORF91 (SEQ ID NO: 346) shows 84.8% identity over a 92aa overlap with a predicted ORF 
(ORF91.ng) (SEQ ID NO: 352) from N. gonorrhoeae: 

orf 91 .pep M KKS S L I S ALG I G I LS I GMAF AAPAD AVS Q I RQNATQVLS I LKNGD ANT ARQ KAEAY A I P 60 

:| I I I : I I I I I I M I I I I I I I hi II I hi i I I I I I M h I I -I! I h llllllh 
or f 9 1 ng VKKS S F I S ALG IGILSI GMAFAS PADAVGQ I RQNATQVLT I LKSGDAASARPKAEAYAVP 6 0 

orf 91 .pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 93 

IMIIIIIIIIIIIII I II hilll Ml 
orf 91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 12 0 

The complete length ORF91ng nucleotide sequence (SEQ ID NO: 351) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 352): 

1 VKKSSFISAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence (SEQ ED NO: 353): 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

3 01 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 
351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

4 01 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 
4 51 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 
501 CGTGTACCGC AACCAATTCG GCGAAATCAT CAAAGCCAAA GGCATCGACG 
551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

This corresponds to the amino acid sequence (SEQ ID NO: 354; ORF91ng-l): 

1 MKKSSFISAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEIIKAK GIDGLIAELK AKNGGK* 

ORF91ng-l (SEQ ID NO: 354) and ORF91-1 (SEQ ID NO: 348) show 92.3% identity in 196 aa 
overlap: 

10 20 30 40 50 60 

orf 91-1 .pep MKKS SL I S ALG IGILSI GMAFAAPADAVSQ I RQNATQVLS I LKNGDANTARQKAEAYAI P 
I I I I hi I h I I I I I I II I I II h h I hi h h I h I hh h I I h I I It I : I 
orf 91ng-l MKKS S F I S ALG IGILSI GMAFAS PADAVGQ I RQNATQVLT I LKSGDAASARPKAEAYAVP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 91- 1 .pep YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKLKNANVNVKDNPIVN 
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70 80 90 100 110 . 120 



orf 91-1 .pep 



• 130 140 150 160 170 180 

KGGKEI IVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEI IKAK 



orf 91ng-l 




130 140 150 160 ' 170 180 



orf 91-1 .pep 



190 

GVDGLIAELKAKNGGKX 



orf 91ng-l 



hllllll Mil II 

GIDGLIAELKAKNGGKX 



190 



In addition, ORF91ng-l (SEQ ID NO: 354) shows homology to a hypothetical E.coli protein (SEQ 
ID NO: 1136): 



sp|P45390 | YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC REGION 
PRECURSOR (F211) )gi|606130 (U18997) ORF_f211 [Escherichia coli] }gi|l789583 
(AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic region [Escherichia 
coli] Length = 211 

Score = 70.6 bits (170), Expect = 6e-12 

Identities = 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPI 118 

+PY + AL +G + + +A+ AQ++A F+L + Y + + T + P 
Sbjct: 65 LPYVQVKYAGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA--PE 122 

Query: 119 VNKGGKEIV-VRAEVGIP-GQKPVNMDFTTYQSG- -GKYRTYNVAIEGTSLVTVYRNQFG 174 

G K IV +R + P G+ PV +DF + + G + + Y++ EG S ++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 

Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N .meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 42 



The following DNA sequence was identified in N. meningitidis (SEQ ID NO: 355): 



1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACTCAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 
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2 51 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 356; ORF97): 



1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
10 101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 357): 



1 ATGAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 

15 51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

2 01 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

2 51 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

20 3 01 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

25 This corresponds to the amino acid sequence (SEQ ID NO: 358; ORF97-1): 



1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE' 
151 KLIQKTVGE* 

30 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF97 (SEQ ID NO: 356) shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) 
(SEQ ID NO: 360) from strain A of N. meningitidis: 



35 10 20 30 40 50 60 

orf 97. pep MKH I L PL I AAS ALC I S T AS AH PAS EPS TQNETAM I THTL I S KY S FGXXXXXXXXA I KS KG 

I Mill III 1 1 Mill III Ihl II III MM Mill = Ml MM 

orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 

40 70 80 90 100 110 120 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

II II II II II II 1 1 II I II I II 1 1 II II II I II II II M 1 1 II II 1 1 1 1 1 1 1 lllllll 

orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

70 80 90 100 110 120 
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130 140 150 160 

orf 97 .pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 

I I I I I I I I I I II I I I I I I N I I I I M I I I I I I I I I h I I I 
orf 97a VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 
5 130 140 150 160 

The complete length ORF97a nucleotide sequence (SEQ ID NO: 359) is: 



1 ATGANACACA TACTCCCCCT GANTGNCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGNN CATCCTGCCA GCGAACCGCA AACCCAAAAC GAAACCGCTA 

10 101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GTACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 

15 351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

401 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCAT AGGCGAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 360): 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

20 51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

151 KLIQKTIGE* 

ORF97a (SEQ ID NO: 360) and ORF97-1 (SEQ ID NO: 358) show 95.6% identity in 159 aa 
25 overlap: 



10 20 30 40 50 60 

orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

I Mill MINIMI 1 1 1 1 M I 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 II II U 1 1 1 M I 

orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
30 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 97a . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

III I IIIIIIMIIIIMIIMIIM IMIMIIIIMIIIIIMM IIIIMI 

or f 97 - 1 MDI FAVIDHQEAARRNGLTMQPAKVI VFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

35 70 80 90 100 110 120 

130 140 150 160 

orf 97a .pep VRAA YTDTRAL I AGS R I GFDE VANTLAN AE KL I QKT I GEX 

1 1 M 1 1 1 M 1 1 1 M 1 1 II I II 1 1 1 1 M 1 1 1 1 1 II Ml 1 1 

orf 97 - 1 VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 
40 130 140 150 160 

Homology with a predicted ORE from N. gonorrhoeae 

ORF97 (SEQ ID NO: 356) shows 88.1% identity over a 159aa overlap with a predicted ORF 
(ORF97.ng) (SEQ ID NO: 362) from N. gonorrhoeae: 



orf 97 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYS FGXXXXXXXXAIKSKG 60 
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III II MM I M I I I I I I I I MM lllllll I II I Mill : MINN 
orf 97ng MKH I LPP I AASAFC I STAS AHPAGKPPTQNETAMTTHTLTS KYS FDETVSRLETAI KS KG 60 

orf 97 . pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I II II I I I I I Ml I I I I II I I I I I I I I I I 
orf 9 7ng MD I F AV I DHQE AARRNGLTMQ P AKV I VFGT P KAGT PLM VKD P AFALQL PLR VL VTETDGK 120 

orf 97 .pep VRAAYTDTRAL I AGSR I GFDE VANTLANAEKL I QKTVGE 159 

Ml I Ml 1 1 1 IM 1 1 IM M M 1 1 1 1 I M I II I M M 

orf 9 7ng VRTAYTDTRAL I VGSR I SFDE VANTLANAEKL I QKTVGE 159 

The complete length ORF97ng nucleotide sequence (SEQ ID NO: 361) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 362): 



1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT P KAGT PLM VK 
101 DP AFALQL PL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KL I QKTVGE* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 363): 



1 ATGAAACACA TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTTCGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

2 01 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 
251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 
301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG TTACCGAAAC 

3 51 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 
401 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

4 51 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 364; ORF97ng-l): 



1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KL I QKTVGE* 

ORF97ng-l (SEQ ID NO: 364) and ORF97-1 (SEQ ID NO: 358) show 96.2% identity in 159 aa 
overlap: 



10 20 30 40 50 60 

orf 97- 1 . pep MKH I LPL IAASALC I STASAH PAS E PS TQNETAMTTHTLTS KYS FDETVS RLETAIKSKG 

II II M M 1 1 II I M 1 1 II i I M : 1 1 1 II I M I M 1 1 1 1 II 1 1 1 II M 1 1 1 II 1 1 

orf 97ng-l MKH I LPL IAASALC I STAS AHPAGKPPTQNETAMTTHTLTS KYS FDETVS RLETAIKSKG 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 97-1 .pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

1 1 1 MM II 1 1 1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 M II 1 1 1 II M III M 1 1 1 M I II 1 1 II 

orf 97ng-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 
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130 140 150 160 

orf 97-1 .pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGEX 



orf 97ng-l VRT A YTDTRAL I VGS R I S FDE VANTLANAE KL I Q KTVGEX 

130 140 150 160 




5 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae , and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

10 ORF97-1 (SEQ ID NO: 358) (15.3kDa) was cloned in pET and pGex vectors and expressed in 
Exoli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figures 12A & 12B show, repsectively, the results of affinity purification of the GST- 
fusion and His-fusion proteins. Purified GST-fusion protein was used to immunise mice, whose 
sera were used for Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 

15 12D). These experiments confirm that ORF97-1 (SEQ ID NO: 358) is a surface-exposed protein, 
and that it is a useful immunogen. 



Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1 (SEQ 
ID NO: 358). 



Example 43 

20 The following DNA, believed to be complete, sequence was identified in N .meningitidis (SEQ ID 



NO: 365): 



25 



30 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



1 



ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 
GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 
GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 
CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 
CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 
CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 
GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 
CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 
CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 
GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 
AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 
ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 



35 This corresponds to the amino acid sequence (SEQ ID NO: 366; ORF106): 



1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 
51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 
101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 
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151 AEAGETKAE I RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence (SEQ ID NO: 367): 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

5 51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

10 301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

3 51 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

4 01 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 
451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 
501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

15 551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 368; ORF106-1): 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

20 101 DYKLSFHPLT NRYRVTVGAF stdydtldaa LRATGAVANW KVLNKGALSG 

151 AEAGETKAE I RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

25 ORF106 (SEQ ID NO: 366) shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) 
(SEQ ID NO: 370) from strain A of N. meningitidis: 

10 20 30 40 50 59 

orf 106 . pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 

llllllllll I Ih: II :: - I I I I I I I I II I I I I = I I I I II llllllllll 
30 orf 106a MAFITRLFKS I KQWLVLLPMLSVLPDAAAEG I DVSRAEAR I XDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 

II I III II II lllllllllllll Mill H III II II Nihil I II I II 

35 orf 106a LQXAXXRGVXLNXTLXWQLS AP 1 1 AS YRFXLGQL I GDDDX I D YKLS FHPLTNRYRVTVGA 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

III I I II Mill III llllllllll lllllllllllll III I Mill llllllllll II 
40 orf 106a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

130 140 150 160 170 180 

180 190 199 

orf 106 .pep SQNWHLDSGWKPLNI I GNKX 

lllllllllllll INI I 
45 orfl06a SQNWHLDSGWKPLNI I GNKX 

190 200 
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Due to the K-»N substitution at residue 111, the homology between ORF106a (SEQ ID NO: 370) 
and ORF106-1 (SEQ ID NO: 368) is 87.9% over the same 199 aa overlap. 

The complete length ORF106a nucleotide sequence (SEQ ID NO: 369) is: 

5 1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

10 251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

15 501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 

551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACAT.CATCGG GAACAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 370): 

1 MAF I TRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEAR IXDGGQLSXX 

20 51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF106 (SEQ ID NO: 366) shows 90.5% identity over a 199aa overlap with a predicted ORF 
25 (ORF106.ng) (SEQ ID NO: 372) from N. gonorrhoeae: 

orf 106 . pep MAF I TRLFKS SK-WL I VPLMLPAFQNVAAEGIDVSRAEAR I TDGGQLS I SSRFQTELPDQ 59 

Illlllllll | ||:: :| :: -III I - I I . I I I I II M I I I I I I I I I I I I 
orf 106ng MAF I TRLFKS I KQWLVLLP I LSVLPDAAAEG IAATRAEAR I TDGGRLS I SSRFQTELPDQ 60 

orf 106. pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 119 

30 I III II II II IMI II II Mil II II ■INI III IN! Ill II Nihil II III 

orf 106ng LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

orf 106 .pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 179 

II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 106ng FSTDYDTLDAALRATGAVANWKVLNKGALS GAEAGETKAE I RLTLSTSKLPKPFQ I NALT 180 

35 orf 106. pep SQNWHLDSGWKPLNI IGNK 198 

I llllll llllllll 
orfl06ng SQNWHLDSGWKPLNI IGNK 199 

Due to the K^N substitution at residue 111, the homology between ORF106ng (SEQ ID NO: 372) 
40 and ORF106-1 (SEQ ID NO: 368) is 91 .0% over the same 1 99 aa overlap. 
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The complete length ORF106ng nucleotide sequence (SEQ ID NO: 371) is: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGCGAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

301 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

3 51 CGTCGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 
4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 
501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 
551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF106-1 (SEQ ID NO: 368) (18kDa) was cloned in pET and pGex vectors and expressed in 
Exoli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 13A shows the results of affinity purification of the His-fusion protein, and 
Figure 13B shows the results of expression of the GST- fusion in Exoli. Purified His-fusion protein 
was used to immunise mice, whose sera were used for FACS analysis (Figure 13C) These 
experiments confirm that ORF106-1 (SEQ ID NO: 368) is a surface-exposed protein, and that it is 
a useful immunogen. 



The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 



1 ATGGACACAA AAGAAATCCT CGG . TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCc TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 



This encodes a protein having amino acid sequence (SEQ ID NO: 



372): 



1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEAR 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW 



I TDGGRLS I S 
LGQLIGDDDN 
WKVLNKGALS 
KPLNIIGNK* 



Example 44 



NO: 373): 
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301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

4 51 CTCGCCATCC TGCTGCTG . T GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC . GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

1201 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 

1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

1301 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 

1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

14 01 GAAAAAACAA GGTTTCCCAT TATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 374; ORF10): 



1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAPPARLSAT AESAAALLAS 

N 301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

4 01 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

451 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence(SEQ ID NO: 375) to be: 



1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

2 51 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 
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1001 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

13 01 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 
1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence (SEQ ID NO: 376; ORF10-1): 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCL GLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA A LLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQLVPK 

151 LAILLLLPLT VGLL HFPANT A VLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFSPAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AES AAALLAS 

301 ALCLTGIF5P LA SLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFAFKTE 

4 01 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 
Prediction 

ORF10-1 (SEQ ID NO: 376) is predicted to be the precursor of an integral membrane protein, 
since it comprises several (12-13) potential transmembrane segments, and a probable cleavable 
signal peptide 

Homology with EpsM (SEP ID NO: 1137) from Streptococcus thermophilics (accession number 
U40830). 

ORF10 (SEQ ID NO: 374) shows homology with the epsM gene of 5. thermophilics, which 
encodes a protein (SEQ ID NO: 1137) of a size similar to ORF10 and is involved in 
expolysaccharide synthesis. Other homologies are with prokaryotic membrane proteins: 

Identities = (25%) 

Query: 213 LRYGI PLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGISFGGAALLLQS I FSTVW 270 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LYYALPL I PSS ILWWLLNASSRYFVLFFLGAGANGLLAVATKI PS I I S I FNTI FTQAW 267 

Identities = 15/57 (26%), Positives = 31/57 (54%) 



Query: 7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 
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L + G++GS +L + + + PL ++ + G L QT A L + ++ + + A +R 

Sbjct: 12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 

Identities = 16/96 (16%), Positives = 36/96 (37%) 

Query: 307 IFSPIASLLLPENYAAWFTVVSCMLPPLFYTLTEISGIGLNVVRKTRPIXXXXXXXXXX 366 
5 + p+ ++ +ya+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF10 (SEQ ID NO: 374) shows 95.4% identity over a 475aa overlap with an ORF (ORFlOa) 
(SEQ ED NO: 378) from strain A of N. meningitidis: 



10 10 20 30 40 50 60 

orf 10. pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

MINI 1 1 1 1 1 1 1 1 1 1 1 1 1 U M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 

orf 10a MDTKE I LGYAAGS I GSAVLAVI I LPLLS WYFPADD I GR I VLMQTAAGLTVS VLCLGLDQA 

10 20 30 40 50 60 

15 70 80 90 100 110 120 

orf 10 . pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

I I I I I I I :' I I I I I I I I M I I !l I I I I I I I I I I I i I I I I I I I I I I I ! I M I I I I I I I I 
orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 



20 130 140 150 160 170 180 

orf 10 .pep ' LSFLPIRFLLLVLRMEGRAIjAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 

1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II II III 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 i 1 1 1 

orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKIjAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 



25 190 200 210 220 ,230 240 

orf 10 . pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 hi 1 1 1 Mill 1 , 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 

orf 1 0a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGI PI ALSS IAYWGLASADRLFLKKY 

190 200 210 220 230 240 



30 



250 260 270 280 290 300 

orf 10 . pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M 1 1 1 1 1 1 Mill IMIIII Mil MM 

orf 10a AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 



35 



310 320 330 340 350 360 

orf 10 . pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTIjAEISGIGLNVVRKTRPIALAT 

III MIIIMMM lllllllllllllll I IIMIIMIIIIMI IMIMII III 

orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 



40 



370 380 390 400 410 419 

orf 10 .pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

lllllllllllll llh I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I: 

orf 10a LGALAANLLLLGL- -AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 
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420 430 440 450 460 470 

orf 10 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

1 1 1 : Ml I M 1 1 1 ■ I M M I i M h 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 IM 

orf 10a LFCLASS AAYTCFGTPANYPLFAGVWAVYLAGC I LRHRKDLHKLFHYLKKQGFPLX 

5 420 430 440 450 460 470 

The complete length ORFlOa nucleotide sequence (SEQ ID NO: 377) is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 

10 101 ACGACATCGG ACGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

15 351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

451 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

20 601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGCA AACGCCCCGC 

25 851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

30 1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

1251 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 

1301 CGGCAAACTA CCCCCTGTTT GCCGGCGTAT GGGCGGTATA TCTGGCAGGC 

35 13 51 TGCATCCTGC GCGACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence (SEQ ID NO: 378): 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

40 51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251' MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

45 3 01 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

3 51 RKTRP I ALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

4 01 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 
451 CILRHRKDLH KLFHYLKKQG FPL* 

50 ORFlOa (SEQ ID NO: 378) and ORF10-1 (SEQ ID NO: 376) show 95.4% identity in 475 aa 
overlap: 

10 20 30 40 50 60 

orf 10-1 .pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
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orf 10a 



Mil II I MM III Mil III MM Mill 1 1 MM Ml I Ml I II! Ml INI III II 

MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
10 20 30 40 50 60 



70 80 90 ' 100 110 120 

orf 10-1 .pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

Ml II MINIM llllllll MINI I MINI I Ml I II II 1 1 lllllll I M 

orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 



10 



130 140 150 160 170 180 

orf 10-1 .pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 

IMIMIIIM llllllll Mill lllllll I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 

orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 



15 



190 200 210 220 230 240 

orf 10-1 .pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 

I I I I I I I I I I I I I I I I I I I I : I I I I illlll I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 1 0 a NLAAAAFLLFQNRCRLKAVRRAP FS S AVLHRGLRYG IPIALSS I AYWGLAS ADRL FLKKY 

190 200 210 220 230 240 



20 



250 260 270 280 290 300 

orf 10-1 .pep AGLEQLGVYSMG I S FGGAALLFQS I FSTVWTP Y I FRAI EENAPP ARLS ATAES AAALLAS 

II II II 1 1 M II II II II 1 1 1 1 II I II I M II II 1 1 II I M M II II 1 1 II I II 1 1 II 

orf 10a AGLEQLGVYSMG I S FGGAALLFQS I FSTVWTP Y I FRAI EANAPP ARLS ATAES AAALLAS 

250 260 270 280 290 300 



25 



310 320 330 340 350 360 

orf 10 - 1 . pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 

III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 IIIMMMIM MINIM III 

orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 



30 



370 380 390 400 410 419 

orf 10 - 1 . pep LGALAANLLLLGLDRAVPAR - PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
llllllllll I llh MMMMMIMNMIIMM llllllll MM 

orf 10a LGALAANLLLLGL - -AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 



35 



420 430 440 450 460 470 

orf 10 - 1 . pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 

MINIM II IIIIIIIIMIIIIIIMIIIIIIII llllllll IIIIIMM 

orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 



Homology with a predicted ORF from N. gonorrhoeae 

40 ORF10 (SEQ ID NO: 374) shows 94.1% identity over a 475aa overlap with a predicted ORF 
(ORFlO.ng) (SEQ ID NO: 380) from N. gonorrhoeae: 



orf 10ng , pep MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

lllllll III llllllll IIIMIII IMIIIII IIMIIIIIIIIIIIIIIII 
orf lOnm MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



60 



60 
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orf 10ng. pep YWEYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSIJ3DAAAGIGLVLFE 120 

1 1 1 i 1 1 1 ^ 1 1 1 1 1 1 1 1 1 [ 1 1 1 ! 1 1 MIIMMMMIMIMIMMMIMIIIIMI 

orflOnm YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

orf lOng . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

5 INI III MMIIIIIIII IIMMMIIIIIMM I II 1 1 1 1 1 1 II I h 1 1 1 1 1 1 II I 

orflOnm LSFLPIRFLLLVLRMEGRALAFSSAQLVPKIAILLLXPLTVGLLHFPANTAVLTAVYALA 180 

orflOng.pep NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 

MM III Mil III MM 1 1 = 111 II II 1 1 II III I hi 1 1 hi III MM II II III I 

orflOnm NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGI PIALSS I AYWGLASADRLFLKKY 240 

10 orflOng.pep AGLEQLGVYSMGI S FGGAALLLQS I FSTVWTPY I FRAI EENATPARLS ATAESAAALLAS 300 

I I I I I I I I I I I I I I M I I It I : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 1 1 1 1 1 1 1 1 M 

orflOnm AGLEQLGVYSMG I SFGGAALLFQS I FSTVWTPY I FRA I EENAPPARLS ATAESAAALLAS 300 

orflOng.pep ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 360 

III 1 : 1 1 1 li 1 1 1 II 1 1 M 1 1 i Mill 1 1 1 1 I : II M 1 1 1 1 1 1 1 1 1 II 1 i I 

15 orflOnm ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 360 

370 380 390 400 410 

orf lOng . pep LGALAANLLLLGL- -AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

lllllllllllll Mh I I I I I I I I I I I I M I II I I I I I I I i I i M I I I I 

orflOnm LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
20 370 380 390 400 410 

420 430 440 450 460 470 ' 

orf lOng . pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

IIM IIIIIIIIIMIMMMIIIII IIIIIIIMM IIIIIMIIII 

orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
25 420 430' 440 450 460 470 

The complete length ORFlOng nucleotide sequence (SEQ ID NO: 379) is: 

1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCcccgCCG 

30 101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG ACTGACGGTG 

151 TCGGTATTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTTTTCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

35 3 51 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG GCGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAA 

4 51 CTCGCCATTC TGCTGCTGTT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 GGCGAACACC TCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

40 601 CGCGCGCCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGCTCGCA CTGAGCAGCC TTGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCGGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGCTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGTGC AATCGAAGAA AACGCCACGC 

45 851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGAAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTACCGT CGTATCGTGT ATGCTGccgc 

1001 cgctGTTTTA CACGCTGACC GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GTCCGATCGC GCTTGCCACC TTGGGCGCGC TGGCGGCAAA 

50 1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCACG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGTTGT TTTTTGTTTT CAAGACAGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 
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1251 CACATTGTTC TGCCTgGCCT CCTCGGCGGC CTACACCTGC TTCGGCACAC 
1301 CGGCAAACTA CCCcctgttt gccggcgtAT GGGCGGCATA TCTGGCAGGC 
13 51 TGCATCCTGC GCCACCGGAA AAATTTGCAC AAACTGTTTC ATTATTTGAA 
1401 AAAACAAGGT TTCCCATTAT GA 

5 

This encodes a protein having amino acid sequence (SEQ ID NO: 380): 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL PPLLFSAAIA ALLL SRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

10 151 LAILLLLPLT VGLLHFPANT SVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSPAVLH RGLRYGIPLA LSSLAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LLQSIFSTVW TPYIFRAIEE NATPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFTWSC MLPPLFYTLT EISGIGLNW 

351 RKTRPI ALAT LGALAANLLL LGLAV PSGGT RGAAVACAAS FWLFFVFKTE 

15 4 01 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAAYLAG 

4 51 CILRHRKNLH KLFHYLKKQG FPL* 

ORFlOng (SEQ ID NO: 380) and ORF10-1 (SEQ ID NO: 376) show 96.4% identity in 473 aa 
overlap: 

20 10 20 30 40 50 60 

or f 1 0 - 1 . pep MDTKEILGYAAGS IGSAVLAVI ILPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 

II I II II 1 1 1 1 1 1 1 i 1 1 1 M 1 1 1 1 1 M 1 1 II 1 1 1 1 1 1 1 M 1 1 1 II I II 1 1 1 1 1 M 1 1 1 1 1 

orf 10ng-l MDTKEILGYAAGS IGSAVLAVI ILPLLSWYFPADDIGRI VLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

25 70 80 90 ' 100 110 120 

orf 10-1 .pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

Mill II M III I MM II MM :|lllll MM Ml 1 1 1! I III Ml MM Mill 

orf 10ng-l YVREYYAAAD KDTLFKTLFLPPLLFSAAIAALLLSRPSLPSE I LFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

30 130 140 150 160 170 180 

orf 10-1 .pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 III II 1 1 1 1 1 1 1 1 ; I I I I I I II I I I : I I M I I I I 

orf 10ng-l LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 

130 140 150 160 170 180 

35 190 200 210 220 230 240 

orf 10-1 .pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

M I II II M II 1 1 M M 1 1 M II I II 1 1 1 1 1 1 1 1 1 M M II 1 1 M II M II 1 1 II 1 1 M 

orf 1 Ong- 1 NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGI PLALSSLAYWGLASADRLFLKKY 

190 200 210 220 230 240 

40 250 260 270 280 290 300 

orf 10-1 .pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 

lllllllllllll IIIIIIMIIIIIIIIIIIII MM 1 1 1 1 1 1 II 1 1 1 1 1 1 M 

orf 10ng- 1 AGLEQLGVYSMGI S FGGAALLLQS I FSTVWTPYI FRAI EENATPARLS ATAESAAALLAS 

250 260 270 280 290 300 

45 310 320 330 340 350 360 

orf 10-1 .pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNWRKTRPIALAT 

Minimi i Minimi minium mimmimmimm i n 

orf 10ng-l ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 
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370 380 390 400 410 420 

or f 10-1. pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 

1 1 1 1 1 M I M 1 1 1 1 1 1 i h 1 1 1 1 1 1 1 1 1 1 1 i 1 1 hi 1 1 M 1 1 1 1 1 II 1 1 II I h 1 1 1 

orf 10ng-l LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 

370 380 390 400 410 420 



430 440 450 460 470 

orf 10-1. pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCI LRHRKDLHKLFHYLKKQGFPLX 

I hi M 1 1 M 1 1 1 1 1 1 1 I ! 1 1 1 II 1 1 1 1 1 1 II 1 1 1 h 1 1 ■ 1 1 1 1 1 M 1 1 1 1 1 

orf 10ng-l CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and TV. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 45 

The following partial DNA sequence was identified in ^meningitidis (SEQ ID NO: 381): 

1 . . ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

2 01 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

2 51 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AGCGGTAAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

401 GCAgCATCGA AAAmGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

4 51 AA . AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 382; ORF65): 



1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REEPDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 383): 



1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 
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251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

4 01 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

5 4 51 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 

10 701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

15 This corresponds to the amino acid sequence (SEQ ID NO: 384; ORF65-1): 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS IEKARSAAAK 

20 2 01 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

25 ORF65 (SEQ ID NO: 382) shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) 
(SEQ ID NO: 386) from strain A of N. meningitidis: 

10 20 30 

orf 65 .pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

Nihil I I I I I : I I I 1 I I I I I I I I I I 
30 orf 65a IIAGILF YLNQSGQNAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADQNALSEPDAAKE 

30 40 50 60 70 80 

40 50 60 70 80 90 

orf 65 . pep AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

III Ihl Mill II II II II III Mllh MINIMI I I MINI I II I II I 

35 or f 6 5a AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 

90 100 110 120 130 140 

100 110 120 . 130 140 150 

orf 65 . pep AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 

Mill MIMIMMMIMM II III MUM MIMI II lllllll III 

40 orf 6 5a AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQ I LNSGS I EKARSAAAKEVQKM 

150 160 170 180 190 200 

160 170 180 190 200 210 

orf 65. pep XOTRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

45 orf 65a KTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGISSKVVGYQAGHKTLYRVQSGNMSAD 

210 220 230 240 250 260 
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The complete length ORF65a nucleotide sequence (SEQ ID NO: 385) is: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 

5 151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 

351 AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 

10 401 ■ AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

4 51 AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC ATCGAAAAAG CGCGCAGTGC CGCTGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 

15 651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 

701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

-751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 



20 



This encodes a protein having amino acid sequence (SEQ ID NO: 386): 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

25 151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTPDK AEATHYLQMG AYADRRSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NNSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a (SEQ ID NO: 386) and ORF65-1 (SEQ ID NO: 384) show 96.5% identity in 289 aa 
30 overlap: 

10 20 30 40 50 60 

orf 65a . pep MFMNKFSQSGKGLSGFFFGLILATVI I AGILFYLNQSGQNAFKIPVPSKQ PAETEILKPK - 

1 1 1 1 M 1 1 M ' 1 1 M 1 1 1 i M I M 1 1 1 M 1 1 1 M M 1 1 1 1 II 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf65-l MFMNKFSQSGKGLSGFFFGLILATVI I AG I LFYLNQSGQNAFKI PAS SKQ PAETEILKPK 

35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 65a . pep NQPKEDIQPEPADQNALSEPDAAKEAEQSDAEKAADKQPVADKADEVEEKADEPEREKSD 

MMIMIIMIMI MINI lllllllllllll MINIMUM Mill: I 

orf 65-1 NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
40 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEPCAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 

1 MM I II II II I II 1 1 1 III 1 1 1 1 1 II 1 1 II M I II MINIM 

orf 65 - 1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
45 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 65a . pep TPEQILNSGSIEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 

I I I I I I I I II III II M I I I I II I I I II I I II II II II I I II I M II I II II I II II I 
orf 65-1 TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYLQMGAYADRQSAEGQRAKLAILGI 
50 190 200 210 220 230 240 
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250 260 270 280 290 

orf 65a . pep S S KWGYQAGHKTLYRVQSGNMS ADAVKKMQDELKKHE VASL I RS I ES KX 

! 1 1 1 M 1 1 1 1 1 I M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 11 1 1 

orf 65-1 SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
5 250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF65 (SEQ ID NO: 382) shows 89.6% identity over a 212aa overlap with a predicted ORF 
(ORF65.ng) (SEQ ID NO: 388) from N. gonorrhoeae: 



30 40 50 60 70 80 

10 ORF65ng IIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLKNQPKEDIQPEPADQNALSEPDVAKE 

II! HI I I I I I : I . I I I I I I I I I : ! I 
ORF65 I LKPHNQLKED I QPDPADQNALS EPDAATE 

10 20 30 



90 100 110 120 130 • 140 

1 5 ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

I II 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I i 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M 

ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

40 50 60 70 80 90 



150 160 170 180 190 200 

20 ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I Ml I I I I I I I I I I II 
ORF65 AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
100 110 120 130 140 150 

210 220 230 240 250 260 

25 ORF65ng KNFGQGGSQRIICKWARMPNPGARKGSVPNWQSWAYLPKWSAIRRDIKRFTACKAAICPP 

I I I I I lllllllllh I I I I I I II I I I I I I M h I I II I I I I h I I I I I I I 
ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 

ORF65ng MR 

30 || 

ORF65 MR 

An ORF65ng nucleotide sequence (SEQ ID NO: 387) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 388): 



35 1 MFMNKFSQSG KGLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

40 251 DIKRFTACKA AICPPMR* 



After further analysis, the complete gonococcal DNA sequence (SEQ ID NO: 389) was found to 



be: 
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15 



1 


ATGTTTATGA 


51 


CTTCGGTTTG 


101 


TGAACCAGGG 


151 


CCTGCAGAAA 


201 


CCAACCTGAA 


251 


AAGAGGCAGA 


301 


GCCGACAAag 


351 


aGAGCCGGAC 


401 


AACAAACcgt 


451 


AAacaaGCgg 


501 


agagaaaaag 


551 


aaatcctcaa 


601 


gaAgtgcaGA 


651 


CTGcaaatgg 


701 


ccaaACtggc 


751 


GGACATAAAA 


801 


gGTGAAAAAA 


851 


TCCGTGcgAT 



ACAAATTTTC 
ATACTGGCAA 
CGGTCAAAAT 
CGGAAATCCT 
CCGGCCGATC 
GCAGTCGGAT 
ccgacgAGGT 
ggACAGGCAG 
cagggAAAAA 
tAaaaccgtc 
gcggcgaaag 
cagccgCagc 
AAatgaaaaa 
gcgcgtatgc 
aAtcttgGgc 
CGCTTTACCG 
ATGCAGGACG 
TGAAGGCAAA 



CCAATCCGGA 
CGGTCATTAT 
GCGTTCAAAA 
GAAACTGAAA 
AAAACGCCTT 
GCGGAAAAAG 
TGAAGAAAag 
TGCGCAAGAA 
GCGCagaaga 
tAAAGAAACa 
aaaAAGttgc 
atcgaaaaag 
ctTtgggcaa 
cgaccgtccg 
atatctTccg 
CGTGCAAagc 
AGTTGAAAAA 
TAA 



AAAGGTCTGT 
TGCCGGTATT 
TCCCGGCTCC 
AACCAGCCTA 
GTCCGAACCG 
CTGCCGACAA 
GcGGgcgAgc 
AGCACTGAcg 
AAGATGCCGA 
gagaaaaaag 
acccaaaccg 
cgcgtagtgc 
ggcgGaagcc 
gagcgcggaA 
aagtggtcgG 
GGCAatatgt 
GCATGGGGtt 



CCGGTTTCTT 
TTGCTTTATC 
GTCGAAGCAG 
AGGAAGACAT 
GATGTTGCGA 
GCAGCCCGTT 
cggaACGGga 
gAAGAgcGTG 
AACGgTTAAA 
cTtcaaaaga 
accccggaaC 
cgctgccaaa 
aacgcattaT 
gggcagcgtg 
CTATCAGGCG 
ccgccgatgc 
gcCAGCCTGA 



20 This encodes the following amino acid sequence (SEQ ED NO: 390): 



25 



MFMNKFSQSG 
PAETEILKLK 
101 ADKADEVEEK 
151 KQAVKPSKET 
EVQKMKNFGQ 
GHKTLYRVQS 



1 
51 



201 
251 



KGLSGFFFGL ILATVIIAGI 



NQPKEDIQPE 
AGEPEREEPD 
EKKASKEEKK 
GGSQRIICKW 
GNMSADAVKK 



PADQNALSEP 
GQAVRKKALT 
AAKEKVAPKP 
ARMPTVRSAE 
MQDELKKHGV 



LLYLNQGGQN 
DVAKEAEQSD 
EEREQTVREK 
TPEQILNSRS 
GQRAKLAILG 
ASLIRAIEGK 



AFKIPAPSKQ 
AEKAADKQPV 
AQKKDAETVK 
IEKARSAAAK 
ISSEWGYQA 



ORF65ng-l (SEQ ID NO: 390) and ORF65-1 (SEQ ID NO: 384) show 89.0% identity in 290 aa 
overlap: 



30 



10 20 30 40 50 60 

orf 65-1 .pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

1 1 1 i 1 1 1 1 1 M I II 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 i II IN MINI I 

orf65ng-l MFMNKFSQSGKGLSGFFFGL I LATVI I AGI LLYLNQGGQNAFKI PAPS KQPAETE I LKLK 

10 20 30 40 50 60 



35 



40 



70 80 90 100 110 120 

orf 65-1 .pep NQPKEDIQPEPADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

I MM 1 1 INI III Mlh I IIIIIIIIIMMIIMIII I IIMIIMM 

orf 65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 6 5 - 1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

I MIMIIIMMMIMIMM IMMIMMIMMIMM I MIMIIMI 

orf 65ng-l GQATOKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

130 140 150 160 170 180 



45 



190 200 210 220 230 239 

orf 65-1 .pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 
II MM I I I I I I i I M I II I M I : :::::::: MMIMMIMM 
orf65ng-l TPEQILNSRSIEKARSAAAKEVQKMKNFGQGGSQRIICKWARMPTWSAEGQRAKLAILG" 

190 200 210 220 230 240 



50 



240 250 260 270 280 290 

orf 65-1 .pep I SS KWGYQAGHKTL YRVQSGNMS ADAVKKMQDELKKHEVAS L I RS I ES KX 
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1 1 1 Ml 1 1 1 1 1 1 1 1 1 II I Ml 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 IIMIhlhll 

orf65ng-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 

250 260 270 280 290 

On this basis, including the presence of a putative transmembrane domain in the gonococcal 
5 protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 46 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 391): 

10 1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

15 251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAaATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

4 01 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

20 501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

25 This corresponds to the amino acid sequence (SEQ ID NO: 392; ORF103): 



1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

30 201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence (SEQ ID NO: 393) as: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

35 101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

40 351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

401 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 

551 CAATCGGCAT TTTTTCCCTG CAACTGAAAA AAATCATGCA AAACCGATAT 

45 601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 
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651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 394; ORF103-1): 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFALQLPPHI NRFWLILLLN 

5 51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

10 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF103 (SEQ ID NO: 392) shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) 
(SEQ ID NO: 396) from strain A of N. meningitidis: 

10 20 30 40 50 60 

15 orf 103 . pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 

II MM 1 1 II I II I II I II II II I II II II Mill II I 1 1 1 1 1 i I IN 1 1 1 1 1 1 1 

orf 103a MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

20 orf 103 .pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

I I II II I I II I I II I I II I I I II II I I I II I I II I I I I I II II II I II I I II I II I II I 
orf 103a GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

25 orf 103 . pep NP ILNRLLP I KS I PACLAVGI LWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

1 1 1 II ! 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M MM 1 1 1 1 M I II 1 1 1 1 1 1 1 1 1 

orfl03a NPI LNRLLP IKS I PACLAVG I LWGWLPCGLVYSASLYALGSGS AATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

30 orf 103 .pep NLLAI G I FS LQLXKI MQNRY I RLCTGLS VS LWALWKLAVLWLX 

II II I I I I I I I II II I M I I I I I I I II I I I I I I I I II I I I I I 
or f 1 0 3 a NLXA I G I FS LQLXK I MQNRY I RLCTGLS VS LWALWKLAVLWLX 

190 200 210 220 

35 The complete length ORF103a nucleotide sequence (SEQ ID NO: 395) is: 



1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

40 201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

4 01 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

45 4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 
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551 CAATCGGCAT TTTTTCCCTG CAACTGNAAA AAATCATGCA AAACCGATAT 
601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 
651 TGCCGTCCTG TGGCTGTAA 

5 This encodes a protein having amino acid sequence (SEQ ID NO: 396): 



1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFALQLPPHI NRXWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLXAIGIF SL QLXKIMQNRY 

10 201 IRLCTGLSVS LWALWKLAVL WL* 

ORF103a (SEQ ID NO: 396) and ORF103-1 (SEQ ID NO: 394) show 97.7% identity in 222 aa 
overlap: 



10 20 30 40 50 60 

1 5 orf 103a . pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

II I IIMIIIIIMIMMIIII MMMMMIIMM I M Mill 1 1 1 1 1 1 1 , 

orf 103 - 1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

10 20 30 40 50 60 

70 80 ' 90 100 110 120 

20 orf 103a .pep GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

II Mill Mill II II III 1 1 1 1 1: 1 1 1 1 1 1 1 II 1 1 1 M M 1 1 1 1 1 1 1 M II 1 1 1 1 1 ' I 

orf 103-1 GL I LGL I GQVGVS LDQTRVLQNI LYTAANLLLLFLGLYLSG I SS LAAKI EKI GKP I WRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

25 orfl03a. pep NP I LNRLLP I KS I PACLAVGI LWGWLPCGLVYS ASLYALGSGS AATGGLYMLAFALGTLP 

III IIIMI MMMMMMMMMMMMMIMMIMMIM IIIIMM 

orf 103 - 1 NP I LNRLLP I KSIPACLAVG I LWGWLPCGLVYS ASLYALGSGS AATGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

30 orf 103a . pep NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

II ■IIIMI M 1 1 II 1 1 1 1 1 1 1 II 1 1 1 II 1 1 M I II M 

orf 103 - 1 NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Homology with a predicted ORF from N. gonorrhoeae 

35 ORF103 (SEQ ID NO: 392) shows 95.5% identity over a 222aa overlap with a predicted ORF 
(ORF103.ng) (SEQ ID NO: 398) from N. gonorrhoeae: 

orf 103 . pep MNHD I TFLTLFLLGXFGGTHC I GMCGGLS S AFXXQLPPH INRFWL I LLLNTGRVS S YTAI 60 

MMIMMIMM IIMIIMM MUM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 

orf 103ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

40 orf 103 .pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

I M 1 1 1 1 M I M M II I IM 1 1 1 1 M M II I M I II I Ml II 1 1 M 1 1 1 II M I II 1 1 

orf 103ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 12 0 



orf 103 .pep 



NPILNRLLPI KSIPACLAVG I LWGWLPCGLVYS ASLYALGSGS AATGGLYMLAFALGTLP 



180 
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orf 103ng 



IIMIII Ml MMMIIIMIIIIIIIIIIIIIMIMIIIIM III IIIIMIMI 

NP I LNRLLPI KS I PACLAVG I LWGWLPCGL VYS AS LYALGSGSATTGGLYMLAFALGTLP 



180 



orf 103 .pep NLLA I G I FS LQLX K I MQNR Y I RLCTGLS VS LW ALWKLAVLWL 222 

Illlllllllll II I I I II I I I M I I II I I I I I I I I I 
orf 103ng NLLA I G I FSLQLKKIMQNRY I RLCTGLS VS LWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence (SEQ ED NO: 397) is: 



1 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCACG 
CGGAACTCAC 
TCCAACTCCC 
ACAGGACGGA 
CGGACAACTC 
tatacacagc 
GGTATTTCTT 
GCGCAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGACTGTAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCATTTCAC 
ctccaaCCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCTGTCGGA 
CATCACTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACAGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTTT 
TACGGCAATC 
TCGACCAAAc 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACTGAAAA 
ATCCGTATCA 



TTCCTGCTCG 
ATTAAGCAGC 
GGCTGATTCT 
GGCCTGATGC 
ccgcgTCCTG 
TTTTAGGCTT 
AAAATCGGCA 
GCTGCCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAATCATGCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAAAATATTT 
ATACTTGAGC 
AACCGATATG 
AAATCCATAC 
GTGCGGACTG 
CGACAACCGG 
AATCTTTTGG 
AAACCGATAT 
TATGGAAGCT 



This encodes a protein having amino acid sequence (SEQ ID NO: 398): 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFALQLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NP I LNRLLPI KS I P ACLAVG I LWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 I RLCTGLS VS LWALWKLAVL WL* 

In addition, ORF103ng (SEQ ID NO: 398) and ORF103-1 (SEQ ID NO: 394) show 97.3% identity 
in 222 aa overlap: 

10 20 30 40 50 60 

orf 103 - 1 . pep MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 

1 1 1 1 1 1 1 , 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 h I M 1 1 

orf 103ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1. pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
I I :| I I I I :hl I I I I I I I I I I I I I hi I I I I I I I I I I I I II I I I I I ' I I I I M I I I I 
orf 103ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103 - 1 . pep NP I LNRLLPI KS I PACLAVG I LWGWLPCGLVYS AS LYALGSGSAATGGLYMLAFALGTLP 

1 1 II 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 . 

orf 103ng NP I LNRLLP IKS I PACLAVG I LWGWLPCGLVYS AS LYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 



190 200 210 220 

orf 103-1. pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I I I I I I I ,11 I II I I I I I I I I I I I I I I I I I ' I I I I I I I I I I I 
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O r f 1 0 3 ng NLLA I G I FS LQLKKI MQNRY I RLCTGLSVS LWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 47 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 399): 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT . TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCG^GGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

401 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

4 51 GTCGGGTTTG GGCGCGTATG C . AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT ' GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

701 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

751 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

801 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA . . 

This corresponds to the amino acid sequence (SEQ ID NO: 400; ORF104): 

1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGS FGEAL KHWEASKVSA 

251 VTTLLPVFTV INTLLGHYVM PETFAAP . . . 

Further work revealed further partial DNA sequence (SEQ ID NO: 401): 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

301 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 
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551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 402; ORF104-1): 



1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MFFNDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFAEPAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP . . . 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical HI0878 protein (SEP ID NO: 1138) of H. influenzae (accession 
number U32769) 



ORF104 (SEQ ID NO: 400) and HI0878 (SEQ ID NO: 1138) show 40% aa identity in 277aa 
overlap: 



orf 104 


4 


QRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 


62 






Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 




HI0878 


3 


QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 


62 


orf 104 


63 


- - KRRDFSWCS FRLLLLGVAG I SANFVL I AQGLHY I S PTTTQ VLWQ ISP FTM I WGVLVF 


120 






K R + +W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 




HI0878 


63 


LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 


118 


orf 104 


121 


KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 


180 






K+ + + QKI + + FND+F +GL Y GV+L G++ WV +AQKL+ 




HI0878 


119 


KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 


178 


orf 104 * 


181 


S AQFGPQQ I LLL I YAAS AAVFLPFAEPAH I GSMDGTLAWVC I A YCCLNTL I GYGS FGEAL 


240 






+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 




HI0878 


179 


LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT- PLALICFIYCCLNTLIGYGSYAEAL 


237 


orf 104 


241 


KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 2 77 








W+ SKVS V TL+P+FT++ + + HY P FAAP 




HI0878 


238 


NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAP 274 





Homology with a predicted ORF from N.meningitidis (strain A) 

ORF104 (SEQ ID NO: 400) shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) 
(SEQ ID NO: 404) from strain A of N. meningitidis: 
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10 20 30 40 50 60 

orf 104 . pep MENQRPLLGFRIALLAAMTWGTLPXSWQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

MINIUM MMMMMMI : 1 1 1 1 1 1 1 II 1 1 II I II I M I II 1 1 II 1 1 1 1 M 1 1 

orf 104a MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 104 . pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

Ml 1 1 1 1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 1 II 1 1 1 1 1 ! 1 1 1 1 

orf 104a LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 

III MINIUM IMMMIII! Illlllll MIIIIMMIMM lllllll 

orf 104a KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 104 . pep S AQFGPQQ I LLL I YAAS AAVFLP FAEPAH I GSMDGTLAWVC I AYCCLNTL I GYGS FGEAL 

llllllllllll Ml IIIIIIIMII IIMhIIIIMIhlll IIIMIIIMMII! 

orf 104a S AQFGPQQ I LLL I YAAS AAVFLPFAELAH I GS LDGTLAWVCFAYCCLNTL IGYGS FGEAL 

190 200 210 220 230 240 



250 260 270 

orf 104 . pep KHWEAS KVSAVTTLLPVFTVINTLLGHYVMPETFAAP 

I ! I I I I 1 I I I I! I I I I : I I I I I [ I I : I I I I I 

orf 104a KHWEAS KVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 

250 260 270 280 290 300 



The complete length ORF104a nucleotide sequence (SEQ ID NO: 403) is: 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGT GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

3 01 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 
351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

4 01 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 
451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 
501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 
551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 
601 TTCCTGCCGT TTGCCGAACT GGCACACATC GGAAGTTTGG ACGGTACGTT 
651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 
701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 
751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 
801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 
851 ATGCCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 
.901 GACAGGCTGT TCAAACGCCG CTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 404): 



1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKWRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MFFNDKFGEL 

151 SGLGAYAKGV LLCAAGSMAW VCYAVAQKLL SAQFGPQQ IL LL I YAAS AA V 
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• 201 FLPFA ELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEAS KVS A 
251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 
301 DRLFKRR* 

5 ORF104a (SEQ ID NO: 404) and ORF104-1 (SEQ ID NO: 402) show 98.2% identity in 277 aa 
overlap: 

10 20 30 40 50 60 

MENQRPLLGFALALLAAMTWGTLPIATOQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

II 1 1 M M II 1 1 1 1 1 1 II 1 1 II I M I M II M M I M I II II M Ml 1 1 1 1 1 1 II II 1 1 

MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
10 20 30 40 50 60 



orf 104a .pep 
10 orfl04-l 



70 80 90 100 110 120 

orf 104a . pep LPKWRDFSWCSFRLLLLGVAGI SANFVL I AQGLHY I SPTTTQVLWQI S PFTMI WGVLVF 

1 1 1 IIIIIIIIIIIIIIIIMIIIIIIIIII I III! 1 1 IIMII III MM II MINI 

1 5 orf 104 - 1 LPKRRDFSWCSFRLLLLGVAG I SANFVL I AQGLHY I SPTTTQVLWQ I S PFTMI WGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104a. pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

M II 1 1 1 1 1 1 1 1 1 1 1 1 1 II II I M II II 1 1 II 1 1 1 M I II 1 1 1 1 M 1 1 1 1 II I M M I II 

20 orf 104 - 1 KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 



orf 104a .pep 
25 orfl04-l 

orf 104a .pep 

30 orfl04-l 

250 260 270 

Homology with a predicted ORF from N. gonorrhoeae 

ORF104 (SEQ ID NO: 400) shows 93.9% identity over a 277aa overlap with a predicted ORF 
(ORF104.ng) (SEQ ID NO: 406) from N. gonorrhoeae: 



190 200 210 220 230 240 

S AQFGPQQ ILLLI YAAS AAVFLP FAELAH I GS LDGTLAWVC FA YCCLNTL I GYGS FGEAL 

I II I I I I I I I I I I I II I II II II I II II I I I II II I I I II II I I I I I I I II I I I I II I I 
SAQ FG PQQ I LLL I YAASAAVFLPFAE PAH I GS LDGTLAWVC FAYCCLNTL I GYGS FGEAL 

190 200 210 220 230 240 

250 260 270 280 290 300 

KHWEAS KVS AVTTLLPVFTVI FSLLGH YVMPDTFAAPDMNGLGYAGALVWGGAVTAAVG 

1 1 1 II 1 1 1 1 1 1 1 1 M 1 1 1 1 lllllllhIMM 

KHWEAS KVS AVTTLLPVFTV I XXLLGH YVMPETFAAP 



35 orf 104 .pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

MIMIIMI IIIIIIIMI II M 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 

orf 104ng MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

orf 104 .pep LPKRRDFSWCS FRLLLLGVAGI SANFVLI AQGLHYI S PTTTQVLWQ I S PFTMI WGVLVF 120 

I I I I II I II I I I I I I I I I M I I I II II I I I I II I II I I I I II II I I I I I II II I I II II 

40 orf 104ng LPKRRDFS WHS FRLLLLGVTG I SANFVL I AQGLHY I SPTTTQVLWQ IS PFTMI WGVLVF 120 

. or f 1 04 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

1 1 1 II 1 1 II II M I II M II I M 1 1 II 1 1 1 1 1 1 1 1 1 Mill Mill III MM III 

or f 1 04ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 180 
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orfl04.pep S AQFGPQQ I LLL I YAAS AAVFLP FAE PAH I GSMDGTLAWVC I AYCCLNTL I GYGS FGEAL 24 0 



The complete length ORF104ng nucleotide sequence (SEQ ID NO: 405) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 406): 



1 MENORPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLA LGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLVGLL MFFNDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL S AQFGPQ Q I L LL I YAAS AAV 

201 FLLXA EPAHI GSL DGTLAWV CFVYCCLNTL I GYGS FGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAA VG 

301 DRPFKRR* 



Further work revealed the complete gonococcal nucleotide sequence (SEQ ID NO: 407): 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGGACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGCAT TCATTCAGGC TGCTGCTGCT CGGCGTGACG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

3 01 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGCGT 

3 51 GTTGGTGTTT AAAGACCGGA tgaCTGCCGC GCAGAAAATC GGTTTGGTTT 
401 TGCTGCttgT CGGTttgCTT ATGTTTTtta ACGACAAATT CGGCGAGTTG 

4 51 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 
501 TATGGCCTGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 
551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGcaag tgccgccGTG 
601 TTCCtgccgT TTGccgaaCC GGCACACATC GGAAGTTTgg aCGGTACGtt 
651 GGCGTGGGTT TGTTTTGTGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 
701 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 
751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 
801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 
851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 
901 GACAGGCCGT TCAAACGCCG CTAG 



This corresponds to the amino acid sequence (SEQ ID NO: 408; ORF104ng-l): 



1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLA LGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLVGLL MFFNDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LL I YAAS AAV 

201 FLPFAEPAHI GSLD GTLAWV CFVYCCLNTL I GYGS FGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW VGGAVTAAV G 

3 01 DRPFKRR* 



ORF104ng-l (SEQ ID NO: 408) and ORF104-1 (SEQ ID NO: 402) show 97.5% identity in 277 aa 



orf 104ng 




orf 104ng 



orf 104 .pep 




overlap: 
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10 20 30 40 50 60 

orf 104-1. pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 i I M 1 1 1 1 1 1 1 1 M I 

or f 1 0 4 ng - 1 MENQRPLLGFALALLAAMTWGTLP I AVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104 - 1 . pep LPKRRDFSWCS FRLLLLGVAG I S ANFVL I AQGLHY I S PTTTQVLWQ I S PFTM I WGVLVF 

lllllllll Ml I II 1 1 hill III II II II I II II I Mil II III II II Ml II II 1 1 

orf 104ng-l LPKRRDFSWHS FRLLLLGVTG I S ANFVL I AQGLHY I S PTTTQVLWQ I S PFTM I WGVLVF 

10 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 104-1. pep KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

i I M II II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1! 1 1 1 II I Ml 1 1 

orf 104ng- 1 KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
15 130 140 150 160 170 180 

190 200 210 220 230 . 240 

orf 104 - 1 . pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I 
orfl04ng-l SAQFGPQQILLLI YAAS AAVFL P FAE PAH I GS LDGTLAWVC FVYCCLNTL I GYGS FGEAL 

20 190 200 210 220 230 240 

250 260 270 

orf 104 - 1 . pep KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 

1 1 1 II 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 II I h 1 1 1 1 1 

orf 104ng-l KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYVGALVWGGAVTAAVG 
25 250 260 270 280 290 300 

In addition, ORF104ng-l (SEQ ID NO: 408) shows significant homology with a hypothetical 
H.influenzae protein (SEQ ID NO: 1 1 38): 

gi | 1573895 (U32769) hypothetical [Haemophilus influenzae] Length = 306 
30 , Score = 237 bits (598), Expect = 8e-62 

Identities = 114/280 (40%), Positives = 168/280 (59%), Gaps = 8/280 (2%) 

Query: 30 QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 88 

Q+P M WG+LPIA++QVL + +A T+VW P 

Sbjct: 3 QQPLLGFTFALITAMAWGSLPIALKQVLSVM^AQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

35 Query: 89 - -KRRDFS WHS FRLLLLGVTG I SANFVL I AQGLHY IS PTTTQVLWQISPFTM I WGVLVF 146 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
Sbjct: 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

Query: 147 KDRMTAAQKIXXXXXXXXXXMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 206 
K+++ QKI +FFND+F +GL Y+ GV+L G++ WV Y +AQKL+ 

40 Sbjct: 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWAYGMAQKLM 178 

Query: 207 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 266 

+F QQILL++Y A F+P A+ + + L LA +CF+YCCLNTLIGYGS+ EAL 
Sbjct: 179 LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 

Query: 267 KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMN 306 
45 W+ SKVS V TL+P+FT++FS + HY P FAAP++N 

Sbjct: 238 NRWDVSKVSWITLVPLFTILFSHIAHYFSPADFAAPELN 277 
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Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 409): 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG . GTTTTGT 

101 T . TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT ATCGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

3 01 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 
351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

4 01 AATGGGTGGA ACGCGTsmmA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 
4 51 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 
501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 
551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 
6 01 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 
651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 
701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 
751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 
801 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 
851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 
901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG. . . 

This corresponds to the amino acid sequence (SEQ ID NO: 410; ORF105): 



1 MVARRAHNPK WGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPTV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

301 NEILYVFDAV LP . . . 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 41 1): 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 TCTGTTCGAG TGGGCAAAAG CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 

151 GAACGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCTGA TATGGGCGGA CGCTTACAGC 

251 ACCTCGCCCT CGGTTGGCAC TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCTT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

401 ACGGTCTGAC CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCAACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 
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551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CTCCGTCAGC CGGGGTGTAC ACAATGAAAT 

651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GATGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 



This corresponds to the amino acid sequence (SEQ ID NO: 412; ORF105-1): 



1 MPTVRFTESV SKQDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWV 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLALGWH CAGLLDGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLTESD GRWHFWIGRR 

151 SPHKAVDPNK LDNTAAGGVS GGEMPSEAVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRSVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF105 (SEQ ID NO: 410) shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) 
(SEQ ID NO: 414) from strain A of N. meningitidis: 



60 70 80 90 100 110 

orf 105 . pep I SERQTAVCLRLQ I QAVWLQS S ALS SRKPTMPTVRFTES VS KQDLDALFEWAKAS YGAES 

1 1 M 1 1 1 ' I h I i M I M II 1 1 1 i 1 1 1 

orf 105a MPTVRFTESVSKHDLDALFEWAKAS YGAES 

10 20 30 



120 130 140 150 160 170 

orf 105 . pep CWKTLYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWH 

MINIMI MIIIIMIMM IMMM 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 Mill 

orf 105a CWKTLYLNGLPLGNLSPEWAERVKKDWEAGCSESSDGIFLNADGWPDMGRRLQHLARIWK 

40 50 60 70 80 90 



180 190 200 210 220 230 

orf 105 . pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 

1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 • I II I * 1 1 ! I II Mill IMMIMMIIIIIIIMI 

orf 105a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 

100 110 120 130 140 150 



240 250 260 270 280 290 

orf 105 . pep S PHKAVDPNKLDNTXAGGVSGGEMPS EAVCRES S EEAGLDKTLLPL I RPVSQLHSLRS VS 

II IIIIMIMI 1 1 1 1 M IM II M I 1 1 M 1 1 1 1 II I II 1 1 1 1 1 Ml i 1 1 II 

orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 

160 170 180 190 200 210 



300 310 
orf 105. pep RGVHNEILYVFDAVLP 
I I I M I I I I I I M I I 

orf 105a RGVHNE ILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLAAMLSGNMMHDAQLVTLDAF 

220 230 240 250 260 270 
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The complete length ORF105a nucleotide sequence (SEQ ID NO: 413) is: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACACG ACCTTGATGC 

51 CCTATTCGAG TGGGCAAAGG CAAGTTACGG TGCGGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ATCTGTCGCC GGAATGGGCG 

5 151 GAGCGCGTCA AAAAAGACTG GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

2 01 CATTTTCCTG AATGCGGACG GCTGGCCAGA TATGGGCAGA CGCTTGCAGC 

251 ACCTCGCCCG AATATGGAAA GAAGCGGGAC TGCTTCACGG CTGGCGCGAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCAGC AATCCCTTGT TCGCGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

10 4 01 ACGGTTTGGT CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGA TCCCGACAAA CTCGACAATA CTGCCGCCGG 

501 CGGTGTTTCC AGCGGTGAAT TGCCGTCTGA AACCGTGTGT CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 

601 TCGCAGCTGC ACAGCCTGCG CCCCGTCAGC CGGGGTGTGC ACAATGAAAT 

15 651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

751 GCTGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 



20 



This encodes a protein having amino acid sequence (SEQ ID NO: 414): 



1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLS PEWA 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 

101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

25 151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

ORF105a (SEQ ID NO: 414) and ORF105-1 (SEQ ID NO: 412) show 93.8% identity in 291 aa 
30 overlap: 

10 20 30 • 40 50 60 

orf 105a . pep MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 

I M 1 1 1 1 1 ihll 1 1 1 i I M I II 1 1 1 1 1 1 1 ■ 1 1 1 1 M 1 1 1 1 1 1 1 M h II 1 1 1 1 M : I 

orf 105 - 1 MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
35 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105a . pep CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 

1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 MINI h MM M 1 : 1 1 1 1 M M h 1 1 1 1 : 1 1 M I 

orf 105-1 CSESSDGI FLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 

40 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105a . pep FRPFGLLSRAVHLNGLVESDGRWHFWIGRRS PHKAVDPDKLDNTAAGGVS SGELPSETVC 

M 1 1 1 1 1 1 M I M II h 1 1 1 M M M M M 1 1 1 M 1 1 h I M 1 1 1 1 1 M h M : M 1 : 1 1 

orf 105-1 FRPFGLLSRA VHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 

45 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 105a . pep RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

lllllllll INI IIIIMI lllllll 1 1 1 1 1 Ml I! 1 1 II III Mil I Ml M I II I 

orf 105-1 RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
50 190 200 210 220 230 240 
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250 260 270 280 290 

orf 105a .pep FE KMD I GGLLAAMLSGNMMHDAQLVTLDAFCR YGL I DAAHPLS EWLDG I RLX 

Mllllllll II llllllll MIMIIIIII MIMII MM MIMMII 

orf 105-1 FE KMD I GGLLDAMLSGNMMHDAQLVTLDAFCRYGL I DAAHPLS EWLDG I RLX 

5 250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF105 (SEQ ID NO: 410) shows 87.5% identity over a 312aa overlap with a predicted ORF 
(ORF105.ng) (SEQ ID NO: 416) from N. gonorrhoeae: 

orf 105 .pep MVARRAHNPKVVGSNPXPATXFQTPRFNAEXVLXLPVSCFLFPAASVFCRI FLPAAISER 60 

10 Illlllllllllllll III M lllllll II MM I Mill II Mill MM 

orf 105ng MVARRAHNPKVVGSNPAPATKYQTPRFNAEGVLF FLFPAASVFCRI FLPAAISER 55 

orf 105 . pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

hlllllMIIIIIIIIIII M I M 1 1 1 1 1 1 1 1 Ml U 1 1 1 1 1 1 lllllllllllll 

or f 1 0 5ng QAAVCLRLQ IQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKAS YGAESCWKT 115 

15 orf 105 .pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

I I I I IIIMIMhlh lllllll llhllllllllllllllllMII I Ml 
or f 1 0 5ng LYLNRLPLGNLS PEWAERI KKDWEAGCSESSNG I FLNADGWPDMGGRLQHLARTWNKAGL 175 

orf 105. pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 24 0 

I 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M || Ml lllllllhlhllllllllllllll 

20 orf 105ng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFWIGRRSPHK 235 

orf 105 . pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

Illhllll : 1 1 1 1 II 1 1 1 1 1 1 1 : 1 I I II I I I I I I I h I I I I I I h I I I I I I I I I 

orf 105ng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 295 

orf 105. pep NE I L YVFDAVL P 312 

25 I I I II I II II I I 

orf 105ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 

A complete length ORF105ng nucleotide sequence (SEQ ID NO: 415) was predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 416): 

30 1 MVARRAHNPK WGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NKAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

35 251 SGGEMPSEAV CRESS EEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

3 51 FYRYGLIDAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 417): 

40 1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 
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201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

3 01 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 
351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

5 4 01 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 
501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 
551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 
601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

10 651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

751 GATGCCATGT TGTCGAAAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT TTATAG 



15 



This corresponds to the amino acid sequence (SEQ ED NO: 418; ORF105ng-l): 



1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

20 151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l (SEQ ID NO: 418) and ORF105-1 (SEQ ID NO: 412) show 93.5% identity in 291 aa 
25 overlap: 

10 20 30 40 50 60 

orf 105-1. pep MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IIIIMIIIIIIIIIII M 1 1 1 M I h I h 1 1 1 1 1 1 1 

orf 105ng-l MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 
30 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105-1. pep CSESSDGI FLNADGWPDMGGRLQHIALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 

llllllllillll MINI III h 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! I 

orf 105ng-l CSESSDGI FLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 

35 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 105-1 .pep FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 

MINN IIIMI MhllllllllllllMI IMII hi 1 1 M 1 1 1 1 1 1 1 1 1 

orf 105ng-l FRPFGLLSRAVHLNGLVESNGRWHFW I GRRSPHKAVDPGKLDN I AGGGVS GGEMPSEAVC 

40 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 105-1. pep RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

MIMI'llllh MMIMIIII I II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II I 

orf 105ng-l RESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
45 190 200 210 220 230 240 

250 260 270 280 290 

orf 105-1 .pep FEKMD I GGLLDAMLSGNMMHDAQLVTLDAFCRYGL I DAAHPLS EWLDGI RLX 
I I I I I I I M I I I I I llllllllillll I I I I I I I I I I I I I I M I I I I 
orfl05ng-l FEKMD I GGLLDAMLS KNMMHDAQLVTLDAF YR YGL I DAAHPLS EWLDG I RLX 

50 250 260 270 280 290 
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Furthermore, ORF105ng-l (SEQ ID NO: 418) shows homology with a yeast enzyme(SEQ ID NO: 
1139): 

sp|P41888 |TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
)gi | 1076928 |pir | | S52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) )gi| 666111 (X84417) thiamin pyrophosphokinase 
[Schizosaccharomyces pombe] ) gi | 2330852 | gnl | PID | e334056 (Z98533) thiamin 
pyrophosphokinase [Schizosaccharomyces pombe] Length = 569 
Score = 105 bits (259) , Expect = 4e-22 

Identities = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 



Query: 


268 


NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW--HFWI 


4.41 






N G+ WRNE + + P+ +ER F FG LS VH + + W+ 




Sbjct : 


96 


NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 


155 


Query : 


442 


GRRS PHKAVDPGKLDN I AGGGVSGGEM P S EAVCRES S EEAGLDKTLFPL I RPVS RLHS LR 


621 






RRSP K P LDN GG+ + G+ + +E SEEA LD + LI P + ++ 




Sbjct : 


156 


PRRS PTKQTWPNYLDNS VAGG I AHGDS VIGTM I KEFS EEANLDVS SMNL I - PCGTVSYIK 


214 


Query: 


622 


PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 


798 






R + E+ YVFD + + +P DGEVAGF + + +L + K+ + LV 




Sbjct: 


215 


MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 


274 


Query: 


799 


LDAF YRYGL I DAAH P 843 








LD R+G+I HP 




Sbjct: 


275 


LDFLIRHGI ITPQHP 2 89 





Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 419): 



1 ATGAATAGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

201 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 

3 51 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

4 01 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 
4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 
501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 
551 TCCTATCCGC . CAATGA 



This corresponds to the amino acid sequence (SEQ ID NO: 420; ORF107): 
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1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 
51 LI FGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 
101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 
151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

5 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .meningitidis (strain A) 

ORF107 (SEQ ID NO: 420) shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) 
(SEQ ID NO: 422) from strain A of N. meningitidis: 

10 10 20 30 40 50 60 

orf 107 .pep 

orf 107a 

15 

orf 107 .pep 
orf 107a 

20 

orf 107 .pep 
orfl07a 



MNRPKQPFFRPEVAVARQTSLTGKVILTRPLS FSLWTTFAS I SALL 1 1 LFLI FGNYTRKT 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1' 1 1 1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 1 1 Mi 1 1 1 1 1 1 1 1 1 1 M I 

MNRPKQPFFRPEVAVARQTSLTGKVILTRPLS FSLWTTFAS I SALL 1 1 LFLI FGNYTRKT 
10 20 30 40 50 60 

70 80 90 100 110 120 

TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 

IIIIIIIIIIIIIMIIIII MINI III IIIIIMIIIIIIIIIIII IIIIIIII 

TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 
70 80 90 100 110 120 

130 140 150 160 170 180 

E AVLKKTLAEQELGRLKL I HGNETRS LKATVERLENQELH ISQQI DGQKRRI RLAEEMLQ 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

130 140 150 160 170 180 



25 



30 



189 

or f 1 0 7 . pep KYRFLSXQX 



orfl07a 



MM 



KYRFLSANDAVPKQEMMNVKAELLEQKAKLDAYRREEVGLLQEIRTQNLTLXSLPQAAX 
190 200 210 220 230 



The complete length ORF107a nucleotide sequence (SEQ ED NO: 421) is: 



1 ATGAATAGAC CCAAGCAACC NTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

35 151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

2 01 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGGATACG GGGACAATTA 

251 CNGCGAAATT CNTGGAAGAT GGAGAAAAGG TTAAGGCTGG CGACAAGCTA 

301 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGATAGCG TGCAGCAGCA 

351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

40 401 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAGCCT TAAAGCAACT 

4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC CAATGATGCA GTGCCAAAAC AAGAAATGAT GAATGTCAAG 

601 GCAGAGCTTT TAGAGCAGAA AGCCAAACTT GATGCCTACC GCCGAGAAGA 

45 651 AGTCGGGCTG CTTCAGGAAA TCCGCACGCA GAATCTGACA TTGGNNAGCC 

701 TCCCCCAAGC GGCATGA 
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This encodes a protein having amino acid sequence (SEQ ID NO: 422): 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

5 151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

201 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

Homology with a predicted ORF from N. gonorrhoeae 

ORF107 (SEQ ID NO: 420) shows 95.7% identity over a 188aa overlap with a predicted ORF 
(ORF107.ng) (SEQ ID NO: 424) from N. gonorrhoeae: 

10 orf 107 .pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

I I I I I I I I I II I I I : I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf 107ng MNRPKQPFFRPEVAIAP.QTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 60 

orf 107 . pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

hllllllllllllllllll llllllllll II I I I I I I I I I II I I I I I I I I I II II I 
1 5 orf 107ng TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

orf 107 .pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 180 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I II I I I I I I I I I I I I I I I I : 
or f 1 0 7ng EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 180 

orf 107. pep KYRFLSXQ 188 

20 I I I I I I I 

orfl07ng KYRFLSAQ 188 

The complete length ORF107ng nucleotide sequence (SEQ ID NO: 423) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 424): 

25 1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFG NYTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

30 Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 

Example 50 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
35 NO: 425): 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT . TTGCC 
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51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

■ 201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

5 251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

3 51 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

10 501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 426; ORF108): 

1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

15 101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence (SEQ ED NO: 427): 

1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

20 51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

25 301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

3 51 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

401 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

30 

This corresponds to the amino acid sequence (SEQ ID NO: 428; ORF108-1): 

1 MLKTS FAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT LFAKLVGNIA EDGGKLTDYL VSHAALQPYQ 

35 151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.sonorrhoeae 

ORF108 (SEQ ID NO: 428) shows 88.4% identity over a 181aa overlap with a predicted ORF 
40 (ORF108.ng) (SEQ ID NO: 430) from N. gonorrhoeae: 

orf 108 .pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAI AGLDLGQSSE 60 

lh MINIMI II II Ml MMMMIIMMII llllllllll 1 1 1 1 1 

orf 108ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orf 108 .pep GKTNDGKKQ I S YP I KGLPEQNVI RLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

45 M 1 1 1 1 1 1 1 II 1 1 1 1 M I M MM M 1 1 Ml 1 1 1 1 II 1 1 1 M 1 1 M M 1 1 II M II ! 

or f 1 0 8 ng GKTNDGKKQ I S YP I KGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 
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orf 108 . pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

1 1 M M 1 1 1 1 1! 1 1 1 II 1 1 M hi ! 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 > 1 1 1 M 1 1 1 

orfl08ng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 



ORF108-1 (SEQ ID NO: 428) shows 92.3% identity with ORF108ng (SEQ ID NO: 430) over the 
5 same 181 aa overlap: 



orf 108-1 .pep 
orf 108ng-l 
orf 108-1 .pep 

10 

orf 108ng-l 
orf 108-1 .pep 
orf 108ng-l 

15 

The complete length ORF108ng nucleotide sequence (SEQ ID NO: 429) is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

2 01 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA AacgccgtCC 
251 gGCTGACCGG AAAGCATCCC AACGACTTGG" AagccgtcgT CGGCAAATGT 
301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

3 51 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 
401 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 
501 AATCGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

This encodes a protein having amino acid sequence (SEQ ID NO: 430): 

30 1 MLKIPFA VLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLALGOSSE GKTNDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

35 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP-binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
TV. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAIAGLDLGQSSE 60 

III 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 I M I M hi I II 1 1 1 1 1 1 1 1 II 1 1 1 1 ! I llllll 

MLKI PFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAI AGLALGQSSE 6 0 
GKTNDGKKQISYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 12 0 

1 1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 1 1 1 1 = ^ I f lllhlllll lllllll I h hi 1 1 ill M II 

GKTNDGKKQI SYP I KGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPS GWAENGVCHT 12 0 
LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

IIIIIIIIIIIIIIMIIIhlhllllllllllllllMIIIIIIIIIIIIIIMIMII 

LFAKLVGN I AEDGGKLTD YL I S HS ALQP YQAGKSGYAAVQNGR YVLE IDS EGAF Y FRRRHY 181 



20 
25 



40 Example 51 
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The following DNA sequence was identified in N. meningitidis (SEQ ID NO: 431): 

1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

451 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 432; ORF109): 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA AS FVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence (SEQ ID NO: 433): 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

3 51 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 
4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 
501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 
551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 
601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 
651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 
701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 
751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 434; ORF109-1): 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPI AA AS FVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VAPLLGFYDG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGANLGA RFAVRFGSKL IKPLLIVISI SMAVKLLIDE 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 



ORF109 (SEQ ID NO: 432) shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) 
(SEQ ID NO: 436) from strain A of N. meningitidis: 



10 20 30 40 50 60 

MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

i . M 1 1 1 1 1 1 1 1 1 M 1 1 1 i MM 1 1 II 'Ml 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 ' i i 1 1 1 1 1 II 

MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
10 20 30 40 50 60 

70 80 90 100 110 120 

TVS FARKGL IDWKKGLP I AAAS FVGGVAGALS VS LVS KD I LLAWPVLL I FVAL YFVFS P 

MMM MIM 1 1 MMM MIM I M MMM I MM I II MIMMMIM I MM 

TVS FARKGL IDWKKGLP I AAAS FAGG WGALS VS LVS KD I LLAWPVLL I FVAL YFVFS P 
70 80 90 100 110 120 

130 140 150 160 170 180 

KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

I ,! 1 1 E 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 M 

KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

20 The complete length ORF109a nucleotide sequence (SEQ ID NO: 435) is: 



5 orfl09.pep 
orf 109a 

10 orf 109. pep 

orf 109a 

15 orfl09.pep 
orf 109a 



1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGTGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

25 201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

401 TTTTTCTGTT CGGTCTGACG GTTGCACCAC TTTTGGGTTT TTACGACGGT 

30 451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

35 701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 436): 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

40 51 LQAAAATFSA TVS FARKGL I DWKKGLPIA A AS FAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VAPLLGFYDG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGANLGA RFAVRFGSKL IKPLLIVISI SMAVKLLIDE 

251 RNPLYQMIVS MF* 

45 

ORF109a (SEQ ID NO: 436) and ORF109-1 (SEQ ID NO: 434) show 99.2% identity in 262 aa 



overlap: 
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10 20 30 40 50 60 

orf 109a. pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

II I 1 1 1 II I M I II 1 1 M 1 1 M I M 1 1 1 1 II M II I M M 1 1 II M 1 1 II 1 1 MM I M 

orf 109-1 MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 109a . pep TVS FARKGL I DWKKGLP I AAAS FAGGWGALS VSLVSKD I LLAWPVLL I FVALYFVFS P 
II I I I II I MM I I I I I II II I M I I Ml I II I I I I I I I 1 I I I I I M I M II I Ml II 
orf 109-1 TVS FARKGL I DWKKGLP I AAAS FVGGVAGALS VSLVSKD I LLAWPVLL I FVALYFVFS P 

70 80 90 100 110 120 

130 140 ' 150 160 170 180 

orf 10 9a. pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 II I M 1 1 1 1 M 1 1 1 1 1 1 1 : 1 1 1 1 II 1 1 1 1 1 1 

orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10 9a. pep LANVACNLGSLSVFLLHGSI IFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

I III II II II I II II I I III II II II II I II II II II I II II II II I II II II II I M I I 
orf 109-1 LANVACNLGSLSVFLLHGSI IFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 10 9a .pep SMAVKLLIDERNPLYQMIVSMFX 

IIIIIIMII MINIM, II 

orf 109-1 SMAVKLLIDERNPLYQMIVSMFX 

250 260 

Homology with a predicted ORF from N. gonorrhoeae 

ORF109 (SEQ ID NO: 432) shows 98.3% identity over a 231aa overlap with a predicted ORF 
(ORF109.ng) (SEQ ID NO: 438) from N. gonorrhoeae: 

orf 109 .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

1 1 1 M II I II II I M 1 1 1 1 1 M II 1 1 M I M 1 1 1 1 1 1 1 1 II I II I II I II 1 1 1 1 II I 

or f 10 9ng MEDLYI ILALGLVAMI AGFIDAIAGGGGLITLPALLLAGI PPVSAI ATNKLQAAAATFSA 6 0 

orfl09.pep TVS FARKGL I DWKKGLP I AAAS FVGGVAGALS VS LVS KD I LLAWPVLL I FVALYFVFS P 120 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml I Ml I 1 1 1 M 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 II M I II 

orf 109ng TVS FARKGL I DWKKGLP I AAAS FAGGWGALS VSLVSKD I LLAWPVLL I FVALYFVFS P 12 0 

or f 1 0 9 . pep KLDGS KEGKARMS FFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

1 1 M I II 1 1 1 1 1 1 M I II M I M 1 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M I Ml 1 1 1 

orf 109ng KLDGS KEGKARMS FFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109 .pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

M I 1 1 1 M I M Ml I II I II M I II 1 1 II I M I M I M I II II IMIII 

orf 109ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence (SEQ ID NO: 437) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 438): 
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1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVS FARKGLI DWKKGLPIA A ASFAGGWGA LSVSLV SKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VATAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

5 201 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 439): 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

]0 101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG * 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

15 3 51 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 

4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

20 601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

25 This corresponds to the amino acid sequence (SEQ ID NO: 440; ORF109ng-l): 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 
51 LQAAAATFSA TVS FARKGLI DWKKGLPI AA ASFAGGWGA LSVSLV SKDI 
101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VAPLLGFYDG 
151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI^ 
30 201 IFPIVATMAV GAFVGANLGA RFAVRFGSKL IKPLLIVISI SMAVKLLIDE 

251 RNPLYQMIVS MF* 

ORF109ng-l (SEQ ID NO: 440) and ORF109-1 (SEQ ID NO: 434) show 98.9% identity in 262 aa 
overlap: 

35 10 20 30 40 50 60 

orf 109ng-l .pep MEDLYI I LALGLVAMI AGF IDAIAGGGGL I TLPALLLAGI PPVSAIATNKLQ AAAATFS A 

1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II 1 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 M 1 1 1 1 1 1 

orf 109-1 MEDLYI I LALGLVAMI AGF I DAIAGGGGLI TLPALLLAGI PPVSAIATNKLQ AAAATFS A 

10 20 30 40 50 60 

40 70 80 90 100 110 120 

orf 109ng-l . pep TVS FARKGL I DWKKGLP I AAASFAGGWGALSVSLVSKD I LLAWPVLLI FVALYFVFSP 

I I I I I II I I I I I I I I I I I I I I I h I I I : I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
O r f 1 0 9 - 1 TVS FARKGL I DWKKGL P IAAAS FVGGVAGALS VS LVS KD I LLAWP VLL I FVALY FVFS P 

70 80 90 100 110 120 

45 130 140 150 160 170 180 

orf 109ng- 1 . pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

I I M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 

orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

50 190 200 210 220 230 240 
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orf 109ng-l.pep LANVACNLGSLSVFLLHGS I I FP I VATMAVGAFVGANLGARFAVRFGSKLI KPLLI VI S I 

1 1 1 1 1 1 1 M 1 1 1 1 1 1 , M 1 1 1 1 hi I M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 II I 

orf 109-1 LANVACNLGSLSVFLLHGS 1 1 FPIAATMAVGAFVGANLGARFAVRFGSKLI KPLLI VIS I 

190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
IIMIIMMIIll IMIIII 
orf 109-1 SMAVKLLIDERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l (SEQ ID NO: 440) shows homology to a hypothetical Pseudomonas 
protein (SEQ ID NO: 1 140): 

sp|P29942 |YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3 ' REGION (ORF9) 
)gi | 94984 jpir | | 138164 hypothetical protein 9 - Pseudomonas sp )gi|551929 (M62866) 
ORF9 [Pseudomonas denitrif icans] Length = 261 
Score =' 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives = 131/214 (60%), Gaps =1/214 (0%) 



Query : 


41 


PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 


100 






PP+ + TNKLQ R+G ++ K+ LP+ D+ 




Sbjct : 


43 


PPLQTLGTNKLQGLFGSGSATLSYARRGHVNLKEQLPMALMSAAGAVLGALLATIVPGDV 


102 


Query: 


101 


LLAWPVLLIFVALYFVFSPKLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFF 


160 




L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 




Sbjct: 


103 


LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 


161 


Query : 


161 


LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSI I FPIVATMAVGAFVGANLGA 


220 






++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 




Sbjct: 


162 


MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 


221 


Query: 


221 


RFAVRFGSKLI KPLLI VIS ISMAVKLLIDERNPL 254 








R+A+ G+K+IKPLL+++SI++A++LL D +PL 




Sbjct : 


222 


RYAMAKGAKIIKPLLVIVSIALAIRLLADPTHPL 2 55 





Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 52 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 441): 



1 . . CTGCTAGGGT ATTGCATCGG TTATCGGTAC GGCTGTTGCA GCAAAACCAG 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG . ATTTT 

101 TGGTTTTCTG GGACTGTATG ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 



CHIR-0160 (356.001) 



-337- 



PATENT 



301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

401 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC- 

501 GGTCGGATTg TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC . AAGC 

551 CCGAAAGTAT . TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 

601 TATTTCCG . A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 442; ORF1 10): 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with ORF88a from N. meningitidis (strain A) 

ORF110 (SEQ ID NO: 442) shows 91.5% identity over a 188aa overlap with ORF88a (SEQ ID 
NO: 332) from strain A of N. meningitidis: 



10 20 30 40 50 60 

or f 88a . pep MSKSRRSPPLLSRPWFAFFSSMRFA VALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I I I M I I I I : M M I I I I I i I I I I I I I I I 
or f 1 1 0 LLG I AS V I GTLL QQNQPQTD YLVKFGSFWA 

10 20 30 

70 80 90 100 110 120 

orf 88a . pep QIFGFLGLYDWASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

Mil MINIM IIIIMIIIIIMIIII MIMIIIIIMMIIIIMIMM 

orf 110 XIFGFLGLYDWASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 88a . pep S S LLDVK I APE VAKR YLE VQGFQGKT INREDGS VL I AAKKGTMNKWG Y I FAHVAL I V I CL 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 H 1 1 1 1 1 1 1 1 1 1 1 1 1 M ! 1 1 1 II 1 1 1 1 1 1 1 i 1 1 1 1 H I 

orf 110 S SLLDVKI APE VAKRYLEVQGFQGKT INREDGS VL I AAKKGTMNKWG Y I FAHVAL I VI CL 

100 110 120 130 140 ■ 150 



190 200 210 220 230 240 

orf 88a . pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADWF 

I I I I I I I I I I I I M I I I : : : III I 

orf 110 GGLI DSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 

160 170 180 190 200 210 

250 260 . ' 270 280 290 300 

orf 88a . pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

orf 110 ' SX 
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However, ORF88 (SEQ ID NO: 328) and ORF110 (SEQ ED NO: 442) do not align, because they 
represent two different fragments of the same protein. 

Homology with a predicted ORF from N. gonorrhoeae 

ORF110 (SEQ ID NO: 442) shows 88.6% identity over a 211aa overlap with a predicted ORF 
5 (ORF1 lO.ng) (SEQ ID NO: 444) from N. gonorrhoeae: 

orfllO.pep LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 30 

I I I I I I I M I I I I I I II I I I I I I Ih 
orf HOng MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orf 110 .pep XIFGFLGLYDVTASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSIJIAMRH 90 

id ii 1 1 1 ii ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 ! 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 

orf HOng RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 12 0 

orf 110 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 150 

I I I I I I I I I I II I I I I I I I : I I I I I I = : I I I I I I I I I I I I I I I I I I II I I I II I I I I I I 
orf HOng SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIXAHVALIVICL 180 

15 orf 110 .pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I Ih II Mil I I hi Mh II I I I I I I I I H Mill I I I I I I IhMIII 
orf HOng GRLINXNLLLKLGMLAGS I FRNNRRVMPRI SKPES I WGGVQSL I KGQRQYFQRGKVRMWF 240 

orfllO.pep S 211 
20 orfllOng S 241 

The complete length ORFl lOng nucleotide sequence (SEQ ID NO: 443) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 444): 

1 MSKSRISPTL LSRPWFAFFS SMRFA VALLS LLGIASVIGT VL QQNQPQTD 

25 51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGS I F 

201 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

30 Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in N. meningitidis (SEQ ID NO: 445): 



35 



1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 
51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 
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101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

4 51 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

701 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 

851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 446; ORF1 1 1): 



1 MPSETRLPNF I RVL I FALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 

301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 

351 R* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.menineitidis (strain A) 



ORF1 1 1 (SEQ ID NO: 446) shows 96.9% identity over a 351 aa overlap with an ORF (ORF1 
(SEQ ID NO: 448) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf Ilia. pep MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 

1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 ^ 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

orf 111 MPS ETRLPNF I RVL I FALGF I FLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPS P 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf Ilia . pep AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I M I II I M I I I h I I I I I I 
orf 111 AE IQKRIDDALKEVNRQMSTYQPDSE I SRFNQHTAGKPLRI SSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 111a . pep GALDVTVGPLVNLWGFGPDKSVTRE PS PEQ IKQAASYTGI DKIILKQGKD YASLSKTHPK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 T 1 1 1 1 1 1 1 1 1 II I 

orf 111 GALDVTVGPLVNLWGFGPDKSVTRE PS PEQ IKQAASYTGI DKIILKQGKD YASLSKTHPK 

130 140 150 160 170 180 



CHIR-0160 (356.001) 



-340- 



PATENT 



190 200 210 220 230 240 

orf Ilia . pep AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
orf 111 AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

5 190 200 210 220 230 240 

250 260 270 280 290 300 

orf Ilia. pep GGNTQ I I VPLNNRSXATSGDYRI FHVDKSGKRLSH I INPNNKRP I SHNLAS I S VXADSAM 

1 1 1 1 1 1 1 ! 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 hi 1 1 M 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 Mill 

orf 111 GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 
10 250 260 270 280 290 300 

310 320 330 340 350 

orf Ilia . pep TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

I I I I I I I I I II M I I M II I I I I I I I I I I ! ! II I I I I I II I I I I I I II I 
orf 111 TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
15 310 320 330 340 350 

The complete length ORF1 1 1 a nucleotide sequence (SEQ ID NO: 447) is: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

20 101 TTACCCTGCA AGGTGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

201 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

3 01 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 
25 351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 
4 51 ATCAAACAAG CAGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 
501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 
551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATNANGT TGCGGGCGAA 

30 601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGNGAGTT 

651 GCACGGCAAA GNCAAAAACG CGCGCGGCGA ACCTTGGCGC ATCGGCATCG 

701 AACAGCCCAA CATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 

751 AACAACCGTT CGNTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

801 TAAAAGCGGC AAACGCCTCT CCCATATCAT TAATCCGAAC AACAAACGAC 

35 851 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGNTCGCAGA CAGTGCGATG 

901 ACGGCGGACG GCTTNTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 

951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 

1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 

1051 CGCTAA 



40 



This encodes a protein having amino acid sequence (SEQ ID NO: 448): 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

51 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 

45 151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 

2 01 LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQ I IVPL 
251 NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 

3 01 TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 
351 R* 



50 Homology with a predicted ORF from N. gonorrhoeae 
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ORF111 (SEQ ID NO: 446) shows 96.6% identity over a 351aa overlap with a predicted ORF 
(ORF1 1 l.ng) (SEQ ID NO: 450) from N. gonorrhoeae: 

10 20 30 40 50 60 

or f 1 1 lng MPSETRLPNLI RALI FALGFI FLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

I I I I I I I II :! h I I I I I I I I I I I I I I I I I I I I I I I I I M I I I III I II I I I I I I I I I I 
or f 1 1 1 MPSETRLPNFIRVLI FALGFI FLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 

10 20 30 40 50 60 



10 



orf 111 
orf 111 



70 80 90 100 110 120 

AKIQKRIDDALKEVNRQMSTYQTDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

I : I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I 
AEIQKRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf lllng GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPK 

1 1 II 1 1 1 1 1 M 1 1 1 1 II 1 1 1 1 II 1 1 II Ml II 1 1 1 II 1 1 1 II M 1 1 1! 1 1 1 1 1 1 1 1 1 1 

orf 111 GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 



20 



190 200 210 220 230 240 

orf lllng AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 

Illllllll MIMIMII IIMIIIIIIIIIIIIIIIIIIMIII MIMIIM 

orf 111 AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



25 



250 260 270 280 290 300 

orf lllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
IIIMliM MMIIMM MMIMIMM llllllllllll MIMMMIM 
orf 111 GGNTQ 1 I VPLNNRSLATSGD YR I FHVDKNGKRLSH 1 1 NPNNKRP I SHNLAS I S WADS AM 

250 260 270 280 290 300 



310 320 330 340 350 

orf lllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
30 | | | || | | | | | | | | | | | | | : | | | : || | | | | | | | | | | | || | | | | | | | | | | | | 

orf 111 TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORF1 1 lng nucleotide sequence (SEQ ID NO: 449) is: 



35 



40 



45 



50 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGCCGTCTG 
CCTGGGTTTC 
TTACCCTGCA 
TCAAATAATC 
TGATGATGCG 
ATTCCGAAAT 
ATTTCAAGCG 
CCTGACTCAC 
GGGGGTTCGG 
ATCAAACAGG 
AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AGCAACCCAA 
aaCaaccgtt 



AAACACGCCT 
ATCTTCCTGA 
AGGCGAAAcg 
GGGACAAACT 
CTTAAAGAAG 
CAGCCGGTTC 
ATTTCGCACA 
GGCGCACTGG 
CCCCGACAAA 
CGGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GGCAAAAATG 
TATCATCCAA 
cgctTGCCAC 



GCCGAACCTT 
ACGCCTGTTC 
aTGGGTACGA 
CCCCTCCCCT 
TCAACCGGCA 
AACCAACACA 
CGTTACCGCC 
ACGTAACCGT 
TCCGTTACCC 
TACGGGCATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCACGGCGA 
GgcgGCAata 
TTCCGGCGAT 



ATCCGCGCCT 
GGaacaaacC 
CCTATACCGT 
GCCAAAATAC 
GATGTCCACC 
CAGCCGGCAA 
GAAGCCGTCC 
CGGCCCTTTG 
GTGAACCGTC 
GACAAAATCA 
CCACCCCAAA 
TTGATAAAGT 
GTCGAAAtcg 
ACCGTGGCGC 
CGCAGATTAt 
TAccgtaTTT 



TGATATTTGC 
GCGCAaaccg 
CAAATACCTT 
AAAAGCGCAT 
TACCAGACCG 
GCCCCTCCGC 
GCCTGAACCG 
GTCAACCTTT 
GCCGGAACAA 
TTTTGCAACA 
GCCTATTTGG 
TGCGGGCGAA 
gcggcGAGTT 
ATCGGTATAG 
cgtcccgctg 
tccacgtcgA 
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801 TAAAAAcggc aaacgccttt cccacaTCAT CAATCCCaAC aacAAACgac 

851 ccATCAGcca caacctcgcc tccatcagcg tggtctcAGA CAGTGCAATG 

901 ACGGCGGACG GTTtatCCAC AGGATTATTT GTTTTAGGCG AAACCGAAGC 

951 CTTAAGGCTG GCAGAACAAG AAAAACTCGC TGTTTTCCTA ATTGTCCGGG 

1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 450): 



1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 
101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 
151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 
2 01 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQI IVPL 
251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 
301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 
351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor (SEQ ID NO: 1 141) from 
H. influenzae: 

sp|P44550|YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECURSOR ) gi | 1074292 | pir | 4 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) ) gi | 1573128 
(U32702) hypothetical [Haemophilus influenzae] Length = 346 
Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query : 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 

+ LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKL I SGI I AVAMALSLAACQKET - KVI SLSGKTMGTT YHVKYLDDGS I TATS E - KTHEE 58 

Query: 67 I DDALKEVNRQMS TYQTDS E I SRFNQHT - AGKPLRI S SDFAHVTAEAVRLNRLTHGALDV 125 

1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA+RLN++T GALDV 
Sbjct: 59 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 

Query: 186 SSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQGGNTQ 245 
SSIAKGFGVD+VA +LE+ QNY+VEIGGE+ KGKN G+PW+I IE+P + 

Sbjct: 179 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 
Query: 246 I I VPLNNRSLATSGDYRI FHVDKNGKRLSHI INPNNKRP I SHNLAS I S WSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 
Sbjct: 239 AVI GLNNMGMAS SGDYR I Y - FEENGKRFAHE IDPKTGYP IQHHLAS I TVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 34 9 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGEDKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 54 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 451): 



1 . .CCGTGCCGCC GACAGGGCGA CGACGTGTAT GCGGCGCACG CGTCCCGTCA 

51 AAAATTGTGG CTGCGCTTCA TCGGCGGCCG GTCGCATCAA AATATACGGG 

101 GCGGCGCGGC TGCGGACGGG TGGCGCAAAG GCGTGCAAAT CGGCGGCGAG 

151 GTGTTTGTAC GGCAAAATGA AGGCAGCCkA yTGGCAATCG GCGTGATGGG 

2 01 CGGCAGGGCC GGCCAGCACG CwTCAGTCAA CGGCAAAGGC GGTGCGGCAG 

2 51 gCAGTGATTT GTATGGTTAT GgCGGGGgTG TTTATGCTgC GTGGCATCAG 
301 TTGCGCGATA AACAAACGGG TgCGTATTTG GACGGCTGGT TGCAATACCA 

3 51 ACGTTTCAAA CACCGCATCA ATGATGAAAA CCGTGCGGAA CgCTACAAAA 
401 CCAAAGGTTG GACGGCTTCT GTCGAAGGCG GCTACAACGC GCTTGTGGCG 

4 51 GAAGGCATTG TCGGAAAAGG CAATAATGTG CGGTTTTACC TACAACCGCA 
501 GgCGCAGTTT ACCTACTTGG GCGTAAACGG CGGCTTTACC GACAGCGAGG 
551 GGACGGCGGT CGGACTGCTC GGCAGCGGTC AGTGGCAAAG CCGCGCCGGC 
601 AtTCGGGCAA AAACCCGTTT TGCTTTGCGT AACGGTGTCA ATCTTCAGCC 
651 TTTTGCCGCT TTTAATGTtt TGCACAGGTC AAAATCTTTC GGCGTGGAAA 
701 TGGACGGCGA AAAACAGACG CTGGCAGGCA GGACGGCACT CGAAGGGCGG 
751 TTCGGTATTG AAGCCGGTTG GAAAGGCCAT ATGTCCGCA. . 

This corresponds to the amino acid sequence (SEQ ID NO: 452; ORF35): 



1 . . PCRRQGDDVY AAHASRQKLW 

51 VFVRQNEGSX LAIGVMGGRA 

101 ' LRDKQTGAYL DGWLQYQRFK 

151 EGIVGKGNNV RFYLQPQAQF 

201 IRAKTRFALR NGVNLQPFAA 

251 FGIEAGWKGH MSA. . 



LRFIGGRSHQ NIRGGAAADG WRKGVQIGGE 
GQHASVNGKG GAAGSDLYGY GGGVYAAWHQ 
HRINDENRAE RYKTKGWTAS VEGGYNALVA 
TYLGVNGGFT DSEGTAVGLL GSGQWQSRAG 
FNVLHRSKSF GVEMDGEKQT LAGRTALEGR 



Computer analysis of this amino acid sequence gave the following results: 



Homology with putative secreted VirG-homolgue of N. meningitidis (accession number A32247) 



ORF (SEQ ID NO: 452) and virg-h protein (SEQ ID NO: 1146) show 51% aa identity in 261 
overlap: 



Orf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDI FDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLS I 455 

Or f 3 5 64 GVMGGRAGQHASVNGKG - - GAAGSDLYGYGGGVYAAWHQLRDKQTGAYLDGWLQYQRFKH 121 

G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 515 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 

RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 



Orf3 5 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 

SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 576 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 
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Orf35 242 AGRTALEGRFGIEAGWKGHMS 262 

+TA+E + G+ K H++ 
virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 

Homology with a predicted ORF from N. meningitidis (strain A) 

5 ORF35 (SEQ ID NO: 452) shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) 
(SEQ ID NO: 454) from strain A of N. meningitidis: 

10 20 30 

or f 3 5 . pep PCRRQGDDVYAAHASRQKLWLRF I GGRSHQN I RG 

:||||lll lllllllllllll Mill 
10 orf 35a QRLAIPEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 

310 320 330 340 350 360 

40 50 60 70 80 90 

orf 3 5 . pep GAAADGWRKGVQ I GGEVFVRQNEGSXLAI GVMGGRAGQHAS VNGKGGAAGSDLYGYGGGV 

III II I I II I II II II I II II I II M I I II I I I I I I I I I I I I I I I I II hill II I 
1 5 or f 3 5a GAAADGRRKGVQIGGEVFVRQNEGSRLAIGVMGGRAGQHASVNGKGGAAGSYLHGYGGGV 

370 380 390 400 410 420 

100 110 120 130 140 150 

or f 3 5 . pep YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 

: 1 1 1 1 - 1 1 1 1 i M 1 1 1 ii 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1! i hi 

20 orf 3 5a YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGVV 

430 440 450 460 470 480 

160 170 180 190 200 210 

orf 3 5 . pep GKGNNVRF YLQPQAQFT YLGVNGGFTDS EGTAVGLLGSGQWQS RAG I RAKTRFALRNGVN 

1 1 1 1 ■ 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M I II M 1 1 1 1 1 M II M 1 1 1 1 1 1 1 1 1 1 : 

25 or f 3 5a GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGI RAKTRFALRNGVN 

490 500 510 520 530 540 

220 230 240 250 260 

orf 3 5 . pep LQP FAAFNVLHRS KS FGVEMDGEKQTLAGRTALEGRFG I EAGWKGHMSA 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I , I II I I I I I I I I I I I I 
30 orf 35a LQPFAAFNVLHRSKS FGVEMDGEKQTLAGRTALEGRFG I EAGWKGHMSARIGYGKRTDGD ' 

550 560 570 580 590 600 

orf 3 5a KEAALSLKWLFX 
610 620 

35 The complete length ORF35a nucleotide sequence (SEQ ID NO: 453) is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

40 2 01 TAATATGCCC GTTGTTAAGA AATATATTAC AGATACTTAC GGGGATAATT 

251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

301 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 

351 GCTTGGACCA AAATTTAGTA TACTCAAACA GAAAAACCCC GATTTAATTA 

4 01 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 

45 4 51 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 

501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 
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551 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 

601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 

651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

701 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 

5 751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

801 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACTCTAAT TCGTTTGCGT 

851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 

901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 

951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

10 1001 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

1201 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 

15 1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

1301 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 

13 51 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 

1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

1451 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 

20 1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 

1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 

1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 

1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 

1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 

25 1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 

1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 

1851 GCTGTTTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 454): 



30 1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKI ENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

35 251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA . AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

401 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 

4 51 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

40 501 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 

551 PFAAFNVLHR SKS FGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 

601 YGKRTDGDKE AALSLKWLF* 

Homology with a predicted ORF from N. gonorrhoeae 

45 ORF35 (SEQ ID NO: 452) shows 51.7% identity over a 261aa overlap with a predicted ORF 
(ORF35ngh) (SEQ ID NO: 456) from N. gonorrhoeae: 

orf 3 5 . pep PCRRQGDDVYAAHASRQKLWLRF I GGRSHQN I RG 34 

:::h: h I I I I I hhl ::| 

orf 3 5ngh FTKVQERDDIAI YAQQAQAANTLFALRLNDKNSDI FDRTLPRKGLWLRVIDGHSNQWVQG 370 

50 orf 3 5 . pep GAA - ADGWRKGVQ IGGE VFVRQNEGSXLAI GVMGGRAGQHASVNGKG - -GAAGSDLYGYG 91 

:| ::|:|||||:|||||: III- hlhllhl h = = = = = : = : hi 

or f 3 5ngh KTAPVEGYRKGVQLGGEVFTWQNESNQLS IGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 430 
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or f 3 5 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I I I I : I I I I : I I I I I I I : I : I : I I I I I : I II I I : I h * I I I I I - h I I I I I • I I 
orf 3 5ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

orf 35 .pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

ll|::| llllllhlllllll I H I I = = hllll I I I I : I = : I I : = I I : I 
or f 3 5ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 .pep GVNLQP FAAFNVLHRS KS FGVEMDGEKQTLAGRTALEGRFG I EAGWKGHMS A 263 

||::|||:| I ::::| ||||:||::::: ::|::| : = |: I hi- 
orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence (SEQ ID NO: 455) is predicted to encode a protein 
having partial amino acid sequence (SEQ ID NO: 456): 

1 . . KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

15 101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDI IF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

3 01 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 
20 3 51 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLS I 

4 01 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 
451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 
501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 
551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

25 601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 55 

30 The following partial DNA sequence was identified in N .meningitidis (SEQ ID NO: 457): 

1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

35 201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TATCAATAAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

40 This corresponds to the amino acid sequence (SEQ ID NO: 458; ORF46): 

1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 
51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 
101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 



45 



Further work revealed further partial nucleotide sequence (SEQ ID NO: 459): 
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1 . . GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

201 GGCGGCCATT AAAGGAAATA TCGGCTACAT TGTCCGCTTT TCCGATCACG 

251 GGCACGAAGT CCATTCCCCs TTCGACAACC . ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC GAACACCATC CCGCCGACGG CTATGACGGG CCACAGGGCG 

4 01 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 

451 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGATATCGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence (SEQ ID NO: 460; ORF46-1): 

1 . . AVCLPMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHHPADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

201 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF46 (SEQ ID NO: 458) shows 98.2% identity over a lllaa overlap with a predicted ORF 
(ORF46ng) (SEQ ID NO: 462) from N. gonorrhoeae: 

orf 46 .pep AEYVQFS IDLFSVGKSGGGI PKAKPVFDAKPRWEVDRKLNKLTTR 4 5 

llillllllllllll I IIIIIMIIII 
orf46ng PKTGVPFDGKGFPNFEKHVKYDTKLD I QELSGGGI PKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf 46 . pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I I I I I I I I I I I I I I I I I M . I I I I I I I II I M I I I M II I I I I I M I I I I I I I I I M 
or f 4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 277 

orf 46. pep RVIQQTSAPDKHGXLSSDSGN 126 

Illllllllllll Mill 

orf46ng RVIQQTSAPDKHGVLSSDSGN 2 98 

A partial ORF46ng nucleotide sequence (SEQ ID NO: 461) is predicted to encode a protein having 
partial amino acid sequence (SEQ ID NO: 462): 

1 . . RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

51 RTRHRSRQQY LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDRICER 

101 EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

151 KLADQRHPKT GVPFDGKGFP NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 

201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 
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Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 463): 



1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGcaAACGAT CCCTTTATCC 

101 GgCaggttcT CGaccGTCAG CATTTCGaac ccgacggGAa ATACCaCCTA 

151 TTcggCaGCA GGGGGGAGCT TgccnagcGC aacggccATa tcggattggG 

201 aaacaTAcaa Agccatcagt tGggccacct gatgattcaa caggcggccg 

251 ttgaaggaaA TAtcgGctac attgtccgct tttccgatca cgggcacaaa 

301 ttccattcgc ccttcGAcaa ccaTGCCTCA CATTCCGATT CTGACGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

4 01 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

451 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGCCGA CCGTTTCCAC AATGCCGGCG CTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC gccGAAGCCT TCAACGGCAC TGCAGATATC GTCAAAAACA 

701 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCagGGT 

751 ATAAGCGAAG GCTCAAACAT TGCTGTCATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA TGGCAGCCAT 

951 CCCCATCAAA GGGATTGGAG CTGTCCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGCGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGC 

12 01 AAAAATGTCA AACTGGCAGA CCAACGCCAC CCGAAGACAG GCGTACCGTT 
1251 TGACGGTAAA GGGTTTCCGA ATTTTGAGAA GCACGTGAAA TATGATACGA 
1301 AGCTCGATAT TCAAGAATTA TCGGGGGGCG GTATACCTAA GGCTAAGCCT 

13 51 GTGTTTGATG CGAAACCGAG ATGGGAGGTT GATAGGAAGC TTAATAAATT 

14 01 GACAACTCGT GAGCAGGTGG AGAAAAATGT TCAGGAAACG AGAAGAAGGA 
1451 GTCAGAGTAG TCAGTTTAAA GCCCATGCGC AACGAGAATG GGAAAATAAA 
1501 ACAGGGTTAG ATTTTAATCA TTTTATAGGT GGTGATATCA ATAAGAAAGG 
1551 CACAGTAACA GGAGGGCATA GTCTAACCCG TGGTGATGTA CGGGTGATAC 
1601 AACAAACCTC GGCACCTGAT AAACATGGGG TTTATCAAGC GACAGTGGAA 
1651 ATTAAAAAGC CTGATGGAAG TTGGGAGGTG AAAACGAAAA AAGGTGGGAA 
1701 AGTGATGACC AAGCACACCA TGTTCCCAAA AGATTGGGAT GAGGCTAGAA 
1751 TTAGGGCTGA AGTTACTTCG GCTTGGGAAA GTAGAATAAT GCTTAAGGAT 
1801 AATAAATGGC AGGGTACAAG TAAATCGGGT ATTAAAATAG AAGGATTTAC 
1851 CGAACCTAAT AGAACAGCAT ATCCCATTTA TGAATAG 



This corresponds to the amino acid sequence (SEQ ID NO: 464; ORF46ng-l): 



1 LGISRKISLI LSILAVCLPM HAHAS PLANT? PFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAXR NGHIGLGNIQ SHQLGHLMIQ QAAVEGNIGY IVRFSDHGHK 

101 FHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLADRFH NAGAMLTQGV 

201 GDGFKRATRY SPELDRSGNA AEAFNGTAD I VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS. NIFMAAIPIK GIGAVRGKYG LGGITAHPVK RSQMGAIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

401 KNVKLADQRH PKTGVPFDGK GFPNFEKHVK YDTKLDIQEL SGGGIPKAKP 

4 51 VFDAKPRWEV DRKLNKLTTR EQVEKNVQET RRRSQSSQFK AHAQREWENK 

501 TGLDFNHFIG GDINKKGTVT GGHSLTRGDV RVIQQTSAPD KHGVYQATVE 

551 IKKPDGSWEV KTKKGGKVMT KHTMFPKDWD EARIRAEVTS AWESRIMLKD 

601 NKWQGTSKSG IKIEGFTEPN RTAYPIYE* 
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ORF46ng-l (SEQ ID NO: 464) and ORF46-1 (SEQ ID NO: 460) show 94.7% identity in 227 aa 
overlap: 

10 20 30 40 

orf 46-1 .pep AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

1 1 1 1 M 1 1 ! 1 1 1 1 1 1 1 1 1 M I i 1 1 1 1 i 1 1 1 1 1 1 1 i ! 1 1 1 i 1 1 I 

orf46ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 
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50 60 70 80 90 100 

orf 46 - 1 . pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 

-I MM Ml I MM MM III I -I MM MM II II M 1 1 III III Illlllll III 

orf46ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 
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110 120 130 140 150 160 

orf46-l .pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

M 1 1 M 1 1 1 1 1 1 1 1 1 1 II I M 1 1 M Ml 1 1 1 1 1 1 1 1 1 M 1 1 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 

orf 4 6ng- 1 VDGFS LYRIHWDGYEHHPADGYDGPQGGGYPAPKGARD I YS YD I KGVAQN I RLNLTDNRS 

130 140 150 160 170 180 
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. 170 180 190 200 210 220 

orf 46 - 1 . pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

I I I I I M I I I I I I : I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I 
orf46ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 



25 



orf 46-1 .pep 
orf 46ng-l 



IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDIiADMAQLKDYAAAAIRDWAVQNP 
250 260 270 280 290 300 



Homology with a predicted ORF from N. meningitidis (strain A) 



30 



ORF46ng-l (SEQ ID NO: 464) shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) 
(SEQ ID NO: 466) from strain A of TV. meningitidis: 



35 



10 20 30 40 50 60 

orf 46a . pep LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

III II III Mill II MINIMI llllll I II MM III I II I II II I MUM I I 

orf46ng-l LGI SRKISLI LS I LAVCLPMHAHASDLANDPF IRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 



40 



70 80 90 100 110 120 

or f 4 6a . pep SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYI VRFSDHGHEVHSPFDNHASHSDSDEAGSP 

M 1 1 1 M 1 1 1 1 1 1 1 i MM 1 1 MM I M 1 1 M 1 1 1 1 1 M I M M 1 1 1 II II 1 1 1 1 1 1 1 

orf46ng-l NGH I GLGN I QSHQLGHLM I QQAAVEGNI GY I VRFSDHGHKFHS PFDNHASHSDSDEAGS P 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 4 6a . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

I I I I I II M I I II II I I I II I II I I M I I I II I I II II II I I I II I I II I I II I I I I II I 
orf46ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
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130 140 150 160 170 180 

190 200 210 220 .230 240 

orf 4 6a . pep TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADI VKNI IGAAGE 

I I ] I h I . I I I :|: I I M I I I I I I I I I M I M I I I I I I I I I I I I I I I I I I I i I I ! I I I 
orf46ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTAD I VKNI IGAAGE 

190. 200 210 220 230 240 

250 260 270 280 290 300 

orf 4 6a . pep IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 M II I II I M 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 

orf 4 6ng- 1 IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMAJ^INDLADMAQLKDYAAAAIRIDWAVQNP 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 46a . pep NAAQGI EAVSNI FTAVIPVKGIGAVRGKYGLGGITAHPVKRSQMGE I ALPKGKSAVSDNF 

IIIIMIIIIIII hlhllllllllllllllllllllllllll IIIIIIIMIIIM 
orf46ng-l NAAQGI EAVSNI FMAAI PI KG IGAVRGKYGLGG I TAHPVKRSQMGAI ALPKGKSAVSDNF 

310 • 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 6a. pep AIDAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKIaANKRHPKTKVPFDGK 
I I I I I I I I II I I I ' I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I- I I ' I I I I I I I 
orf4 6ng-l ADAAYAKYPSPYHSR^IRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 

370 380 390 400 410 420 

430 440 450 460 470 

orf 46a. pep GFPNFEKDVKYDTRINTAVPQVN P I DEP VFN - - PKGS VGS AHSWS I TAR I QYAKLP 

IIIIIM IMIh:: : ::: | :|||: |: | : ::|:| | | 

orf 4 6ng- 1 GFPNFEKHVKYDTKLD- - 1 QELSGGG I PKAKPVFDAKPRWEVDRKLN- KLTTREQVEKNV 

430 440 450 460 470 

480 490 500 510 520 530 

orf 4 6a. pep RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 

- I I 

orf46ng-l QETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDVRVIQQTS 
480 490. 500 510 520 . 530 



The complete length ORF46a DNA sequence (SEQ ID NO: 465) is: 



1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGCAAACGAT TCTTTTATCC 

101 GGCAGGTTCT CGACCGTCAG CATTTCGAAC CCGACGGGAA ATACCACCTA 

151 TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

201 AAACATACAA AGCCATCAGT TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

251 TTAAAGGAAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

301 GTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

451 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGTCGA CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC GCCGAAGCTT TCAACGGCAC TGCAGATATC GTCAAAAACA 

701 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCAGGGT 

751 ATAAGCGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATCCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 
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951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG. CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

14 01 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

14 51 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA CACACAAATG A 

This corresponds to the amino acid sequence (SEQ ID NO: 466): 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 
51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 
101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 
151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 
201 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 
251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 
301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 
351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 
4 01 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 
451 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 
501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 
551 GKITHK* 

Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, 
typical of adhesins, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 56 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 467): 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 468; ORF48): 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 
51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 
101 NLVPFILTAP APYQIMTGL. . . 
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Further work revealed the complete nucleotide sequence (SEQ ID NO: 469): 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

5 101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

10 351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCGAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

15 601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

20 851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

25 1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

13 01 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 
30 1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

14 01 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 470; ORF48-1): 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

35 51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAAKTDF RHIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN I FGANNFYYA KSQAMLYTVS QNADFITAGL 

2 01 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

2 51 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC. AYGGLRGFAL RRAPDEKFAR 

40 301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 

451 NLNETFRYLK QGHVAWLNFK IK* 

45 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 



ORF48 (SEQ ID NO: 468) shows 94.1% identity over a 119aa overlap with an ORF (ORF48a) 
(SEQ ID NO: 472) from strain A of N. meningitidis: 
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10 20 30 40 50 60 

orf4 8 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

IIIIIIIIIIIIIIIIIIIIIMIIIIII lllllllllllllllllllll llllllll 
orf4 8a MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 

10 20 30 40 50 60 

70 80 90 100 110 119 

orf 4 8 . pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 

Mill Ml MM 1 1 M 1 1 1 1 M I ! 1 1 ! I H 1 1 1 1 1 1 1 U 1 1 MM MMIM 

orf 4 8a ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 

70 80 90 100 110 120 

orf 4 8a LLYMIJiMPFVLQKAAAKTDFRHIAACAAVVVAAGYFTGHLSXYDRGRMANIFGANNFYYA 

130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence (SEQ ID NO: 47 1 ) is: 



1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

' 451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT ■ TGCCAAACTG 

751 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGATCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC NTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGNCTGGCT 

14 01 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 472): 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATARPIVN 

51 LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFIXTAP ALYQIMTGLL LLYMLAMPFV LQKAAAKTDF RHIAACAAW 

151 VAAGYFTG HL SXYDRGRMAN I FGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

4 01 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
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451 NLNETFRYLK QGHVXWLNFK IK* 

ORF48a (SEQ ID NO: 472) and ORF48-1 (SEQ ID NO: 470) show 96.8% identity in 472 aa 
overlap: 

10 20 30 40 50 60 

orf4 8a.pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 

II MM IIIIIIIIMIIMI IIIIMII MIIIMMIMMMIIIII IIIIIMI 

orf 48-1 MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 48a . pep AL PWRX VKI XGVLAXWLAVLFDGLMMV I QL F P FMDL I GAI NL VP F I XTAP AL YQ I MTGLL 

Mill III 1 1 1 1 1 1 1 M 1 1 1 i 1 1 1 1 1 ' I' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINIM 

orf48-l ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 4 8a . pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVVVAAGYFTGHLSXYDRGRMANIFGANNFYYA 

I I I II I I I II I I I I I I I I I I I I I I = II I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 4 8-1 LLYMLAMPFVLQKAAAKTDFRHI AVCAAWAAAGYFTGHLSYYDRGRMANI FGANNFYYA 

130 140 150 160 170 180 



20 



25 



30 



35 



190 200 210 220 230 240 

orf 4 8a. pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

1 1 1 1 1 Ml 1 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 Mi 1 1 II M 1 1 1 1 M 1 1 1 II I 

orf 4 8-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

or f 4 8a . pep ELQNATFAKLLAQKXRFS VWESGS FP F I GAT I EGEMRELCAYGGLRGFALRRAPDEKFAR 

II IMMMIMM MMIMIMMMIMMIMM llllll 

or f 4 8 - 1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 48a . pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I 
orf 48 - 1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 48a . pep LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 

llllll 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 II III 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 M MINIMI 

orf 48-1 LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

370 380 390 400 410 420 



40 



430 440 450 460 470 

orf 48a. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 

llllll III II II MM Mill llllll II II II II II MM II II MINI 

orf 4 8 - 1 FFDQLADL I QRPEMKGTE VI I VGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

430 440 450 460 470 



45 Homology with a predicted ORF from N. gonorrhoeae 
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ORF48 (SEQ ID NO: 468) shows 97.5% identity over a 119aa overlap with a predicted ORF 
(ORF48ng) (SEQ ID NO: 474) from N. gonorrhoeae: 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

1 1 1 i ^ 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 ! 1 1 1 1 1 1 ! 1 1 1 1 MM 

5 orf4 8ng MNIHALLSEQWTLPPFLPKKLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 48 .pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

llllllllllllllll IMIIMIIIIIIIMI MIIIMIIIMIIIMII IIMII 

orf4 8ng ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 120 

10 The ORF48ng nucleotide sequence (SEQ ID NO: 473) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 474): 

1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 
51 LDYLPAALLI ALPWRFVKIA G VLAFWPAVL FDGLMMVI Q L FPFMDLIGAI 
101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAW 
15 151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 

201 PYASMGNGG . . 

Further work identified the complete gonococcal DNA sequence (SEQ ID NO: 475): 



1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 
20 51 - GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

2 01 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 
251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

25 301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

3 51 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

4 01 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 
4 51 GCGGCAGCCG GCTATTTCAC CGGCCATTTG. AGTTACTACG ACCGGGGGCG 
501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc aAAAGTCAGG 

30 551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

701 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

35 801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

40 1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtcttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

45 1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

13 51 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

14 01 GCACTTCAAA ATCAAATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 476; ORF48ng-l): 
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1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAW 

151 AAAGYFTGHL SYYDRGRMAN I FGANNF YYA KSQAMLYTVS QNADFITAGL 

5 201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SD I FNHRLKC 

401 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

10 451 NLNETFRYLK QGHVAWLHFK IK* 



ORG48ng-l (SEQ ID NO: 476) and ORF48-1 (SEQ ID NO: 470) show 97.9% identity in 472 aa 
overlap: 



10 20 30 40 50 60 

1 5 orf 4 8 - 1 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPI VNLDYLPAALLI 

1 1 1 hi I hi 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 M II 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 h II M 

orf 4 8ng- 1 MN I HALLS EQWTLPPFLPKRLLLSLL I LLAPNAVFWVLALLTATAR PI VNLDYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 120 

20 orf 4 8 - 1 . pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

II I I I I I I I I I II I II I h I h I I II I I I I I II II I h h I I I h I II I I II I I I I 
orf 4 8ng- 1 ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 



130 140 150 160 170 180 

25 or f 4 8 - 1 . pep LLYMLAMPFVLQKAAAKTDFRHIAVCAAVVAAAGYFTGHLSYYDRGRMANIFGANNFYYA 

I M I II I I I I I I II I : I I II I I I II I I I I I I M I I I I I I I I I I I I I I I I M I I I I I I I I I 
orf 4 8ng- 1 LLYMLAMPFVLQKAAVKTDFRH I AVCAAWAAAGYFTGHL SYYDRGRMAN I FGANNF YYA 

130 140 150 . 160 170 180 

190 200 210 220 230 240 

30 or f 4 8 - 1 . pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

IIIIIIIIMIMIMIIIIIIIIIIIIIIMIIIhhlllllllllllllllllhll 

orf48ng-l KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 



250 260 270 280 290 300 

35 orf 4 8 - 1 . pep ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

I h I I I I I I II I h ■ I I I I I I I I I II I I I I I I I I I I ! h h I I I I I I I I I I I I I II I I I 
or f 4 8ng- 1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 



310 320 330 340 350 360 

40 orf 48 - 1 . pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

11 1 [ I I I I I i I I I 1 I I I I : I I I I III I 1 I I I I I I I I I I I 1 

orf 4 8ng- 1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 



370 380 390 400 410 420 

45 orf 4 8 - 1 . pep LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDI FNHRLKC TEYGLPAETD LCRNFSLHTQ 

I i 1 1 MM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 E 1 1 1 1 1 1 1 E 1 1 

orf48ng-l LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

370 380 390 . 400 410 420 



50 



orf 48-1 .pep 



430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 



CHIR-0160 (356.001) 



-357- 



PATENT 



orf48ng-l 




430 440 450 460 470 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 57 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 477): 



1 . . GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

201 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 

251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

301 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC . G TAGCGCCGAA 

351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

4 01 TGATCAATAT GTACGCC . . 



This corresponds to the amino acid sequence (SEQ ID NO: 478; ORF53): 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 479): 



1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 



51 
101 



1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
1 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 
1 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 
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1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTTAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

12 01 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

5 1251 ATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 480; ORF53-1 ): 

1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYL WVF LILCILSATI 

101 NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL IMASCLIILV SGRYRALDRV 

10 151 S KIIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA F I AFACMYGT 

301 TITWD GYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIFW FDG 

351 VMAN LLKFAM IAAFVSAPVF AW LNYRLVKG DEKHKLTSGM NA LALAGLIY 

15 4 01 LTGFTVLFL L NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF53 (SEQ ID NO: 478) shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) 
20 (SEQ ID NO: 482) from strain A of N. meningitidis: 

10 20 30 

orf 53 .pep VS GRYRALDRVS K 1 1 IVTLS I ATLAAAGIA 

I II 1 1 M M M 1 1 M I II 1 1 1 1 1 Ml I M 

orf 53a AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIA 
25 110 120 130 140 150 160 

40 50 60 70 80 90 

orf 53 .pep MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 

I Ml II III II II I II II I II II I II II III II III II II I II II Ml II IN II III II 

orf 53a MSRGMQMQ5DFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
30 170 180 190 200 210 220 

100 110 120 130 139 

orf 53 . pep IFEFNVGY IASAVLiALVFLALGXVA PNGNGXTVQMAGGKYNGQLINMYA 

Ihlllllllllllllllllll : II M III II I! MM III 
or f 5 3 a I FDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKY I GQLINMYAVT I GGWS RPLV 

35 230 240 250 260 270 280 

orf 53a AFIAFACMYGTTITWD GYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFD 
290 300 310 320 330 340 

The complete length ORF53a nucleotide sequence (SEQ ID NO: 481) is: 

40 1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

45 251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 
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351 


TCCCTCGCTG 


ATGTTTGATG 


401 


CCTGCCTGAT 


TATTTTGGTG 


451 


TCCAAAATCA 


TCATCGTTAC 


501 


CATCGCTATG 


TCGCGCGGTA 


551 


CACCGTGGAC 


GCTTGCCGGT 


601 


ATGCCCGCGC 


CGATTGAAAT 


651 


AAAACAACGC 


ATCAATCCTT 


701 


ACGTCGGTTA 


TATCGCCAGT 


751 


GGCGCGTTTG 


TGCAATACGG 


801 


CAAATATATC 


GGGCAATTGA 


851 


GGTCGCGCCC 


GCTGGTGGCG 


901 


ACGATTACCG 


TTGTGGACGG 


951 


CCTGCTGCGC 


GGAAAAGACA 


1001 


ATATTTGGGT 


GGCGGGCAGC 


1051 


GTAATGGCGA 


ATCTGCTCAA 


1101 


CCCTGTGTTT 


GCCTGGCTGA 


1151 


ACAAACTCAC 


ATCAGGTATG 


1201 


CTGACCGGTT 


TTACCGTTTT 


1251 


ATGA 





CCGGCACGGT TGCCGCCTTG ATTATGGCAT 
AGCGGACGTT ACCGCGCTTT GGATCGCGTT 
TTTGAGTATC GCCACGCTTG CCGCCGCCGG 
TGCAGATGCA GTCCGATTTT ATCGAGCCGA 
TTGGGCTTCC TGATCGCGCT GATGGGCTGG 
TTCCGCCATC AATTCTTTGT GGGTAACCGA 
CCGAATACCG CGACGGGATT TTTGATTTCA 
GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 
CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 
TCAATATGTA CGCCGTTACC ATCGGCGGCT 
TTTATCGCGT TTGCCTGTAT GTACGGCACG 
CTATGCCCGT GCCATTGCCG AACCCGTGCG 
AAACGGGCAA CGCCGAATTC TTTGCCTGGA 
GGTTTGGCGG TGATTTTCTG GTTTGACGGC 
ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 
ATTACCGTTT GGTCAAAGGT GATGAAAAAC 
AATGCCCTTG CATTGGCAGG CTTGATTTAT 
GTTCTTATTG AATTTGGCGG GAATGTTCAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 482): 



1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALI I 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYL WVF LILCILSATI 

101 N AG A V A I VTA AIVKMAIPSL MFDAGTVAAL IMASCLIILV SGRYRALDRV 

151 S KIIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPA PIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITWD GYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVI F WFDG 

351 VMAN LLKFAM IAAFVSAPVF AW LNYRLVKG DEKHKLTSGM NA LALAGLIY 

4 01 LTGFTVLFL L NLAGMFK* 

ORF 53a (SEQ ID NO: 482) shows 100.0% identity in 417 aa overlap with ORF53-1 (SEQ ID NO: 
480): 



10 20 30 40 50 60 

orf53a.pep MS EQHISTWKSK I NALGPGIMMASAAVGGSHLI AS TQAGALYGWQI ALII ILTNLFKYPF 
I I I I I I I I I I Ml I I M I I! I I I I I I I I I I I I I II I M I I I I I I I I I I I II I I I M M 
orf 53-1 MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 53a . pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
I I I I I I I I M I M I I I I I I I I I I I ' I I I I II I I I I I M I I I I I I I I I I I I I I I I I I M I 
orf 53-1 FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 53a . pep MFDAGTVAAL IMASCL I ILVSGRYRALDRVSKI I I VTLS I ATLAAAGIAMSRGMQMQSDF 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I HI I I M I I I I I I I 
or f 53 - 1 MFDAGTVAALIMASCLI ILVSGRYRALDRVSKI 1 1 VTLS I ATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 53a. pep I EPTPWTLAGLGFLIALMGWMPAPIE I SAINSLWVTEKQRINPSEYRDGI FDFNVGYIAS 
I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 53-1 I EPTPWTLAGLGFLIALMGWMPAP I EI SAINSLWVTEKQRINPSEYRDGI FDFNVGYIAS 
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190 200 210 220. 230 240 

250 260 270 280 290 300 

orf 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

I I I I I I I I II I I I I I I I I M I I I I I M I I I I I I M I I I I I I I I I i I I II I I I I M II I 
5 orf 53 - 1 AVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVAF I AFACMYGT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 53 a .pep T I TWDG YARA I AE P VRLLRGKD KTGNAE F FAWN I WVAGSGLAV I FWFDG VMANLLKFAM 

I ! 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 II Ml II 1 1 1 M 1 1 M 1 1 1 II 1 1 1 1 1 1 

1 0 orf 53 - 1 T I TWDG YARA I AE P VRLLRGKD KTGNAE F FAWN I WVAGSGLAV I FW FDG VMANLLKFAM 

310 320 330 340 350 360 

370 380 390 400 410 

orf 53a . pep IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

.1 llllllllllllllll I.IMIIIIIIIIIMIIIIIIIIIIII IMIIIII 

1 5 orf 53 - 1 I AAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLI YLTGFTVLFLLNLAGMFKX 

370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF53 (SEQ ID NO: 478) shows 92.1% identity over a 139aa overlap with a predicted ORF 
(ORF53ng) (SEQ ID NO: 484) from N. gonorrhoeae: 

20 orf 53. pep VSGRYRALDRVSKI I IVTLS IATLAAAGIA 30 

I II I II I I I I I I I I I I I I I I I I M M I I i I 
orf 53ng AAIVKMAI PS LMFDAGTVAALIMASCL 1 1 LVSGRYRALDRVSKI I IVTLS IATLAAAGIA 91 

orf 53 .pep MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 90 

IMIIIII 1 1 1 1 1 1 1 1 ■ M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

25 orf 53ng MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

orf 53 .pep I FEFNVGY I AS AVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 13 9 

I I :| I I M i I I 1 I I II I ,M : Ml :|||:|||l IMIIIII 
orf 53ng I FDFNVGYI AS AVLALVFLALGAFVQYGNGEAVQMGGGKY I GQL INMYAVT IGGGSRPLV 211 

An ORP53ng nucleotide sequence (SEQ ID NO: 483) was predicted to encode a protein having 
30 amino acid sequence (SEQ ID NO: 484): 

1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVS KIIIVTL SIATLAAAGI AM SRGMQMQP 

101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 G I FDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

35 201 VTIGGGSRPL VAFIAFACMY GAASTWDGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence (SEQ ID NO: 485): 

1 . . aagaAAAGCT GCGTTTATTT GTGGGTTTTT TTGATTTTGT GTATCGCCTC 
40 51 CGCCACGATT AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA 

101 AAATGGCGAT TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG 
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10 



15 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 



ATTATGGCAT 
GGATCGTGTT 
CCGCCGCCGG 
ATCGAGCCGA 
GATGGGCTGG 
GGGTAACCGA 
TTCGATTTCA 
CCTTGCACTG 
TGGCGGGCGG 
ATCGGCGGCT 
GTACGGCACG 
AACCCGTGCG 
TTtgccTGGA 
GTTTGACggc 
TTGTGTCCGC 
GACAAACGCC 
CCTGCTCTAC 
GACTTTTGGC 



CCTGCCTGAT 
TCCAAAATCA 
CATCGCTATG 
CACCGTGGAC 
ATGCCCGCGC 
AAAACAACGC 
ACGTCGGTTA 
GGCGCGTTTG 
CAAATATATC 
GGTCTCGTCC 
ACGATTACCG 
CCTGCTGCGC 
ATATTTGGGT 
gcaaTGGCgG 
CCCTGTGTTC 
ACAGGCTTAC 
CTGGCCGGGT 
ATAG 



TATTTTGGTG 
TCATTGTTAC 
TCGCGCGGTA 
GCTTGCCGGT 
CGATCGAAAT 
ATCAATCCTT 
TATCGCcagT 
TGCAATACGG 
GGGCAATTGA 
GCTGGTGGCG 
TTGTGGACGG 
GGCAGGGATA 
GGCGGGCAGC 
AACtgcTCAA 
GCCTGGCTCA 
CGCCGGTATG 
TTGCCGTTTT 



AGCGGACGTT 
TTTGAGCATC 
TGCAGATGCA 
TTGGGCTTCC 
TTCCGCCATC 
CTGAATACCG 
GCGGTTTTGG 
CAACGGCGAA 
TTAATATGTA 
TTTATCGCGT 
TTATGCGCGT 
AAACCGGCAA 
GGTTTGGCGG 
ATTTGCGATG 
ACTACCGCCT 
AACGCCCTTG 
GTTCCTGTTG 



ACCGCGCTTT 
GCCACGCTTG 
GCCCGATTTT 
TGATCGCGCT 
AATTCTTTGT 
CGACGGGATT 
CTTTGGTTTT 
GCAGTGCAGA 
TGCCGTAACC 
TTGCCTGTAT 
GCCATTGCCG 
CGCCGAGTTG 
TGATTTTCTG 
ATtgccgcCT 
CGTCAAAGGG 
CCATTGTCGG 
AACCTTACCG 



20 This corresponds to the amino acid sequence (SEQ ID NO: 486; ORF53ng-l ): 



25 



i 

51 
101 
151 
201 
251 
301 



. KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL 
IMASCLIILV SGRYRALDRV SKIIIVTLSI ATLAAAGIAM 



IEPTP WTLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR 
FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKYI 
IGGWSRPL VA FIAFACMYGT TITWD GYAR AIAEPVRLLR 
FAWNIWVAGS GLAVIFWFDG AMAELLKFAM IAAFVSAPVF 
DKRHRLTAGM NALAIVGLLY LAGFAVLFLL NLTGLLA* 



MFDAGTVAAL 
SRGMQMQPDF 
INPSEYRDGI 
GQLINMYAVT 
GRDKTGNAEL 
AWLNYRLVKG 



30 



ORF53ng-l (SEQ ID NO: 486) and ORF53-1 (SEQ ID NO: 480) show 94.0% identity in 336 aa 
overlap: 



35 



60 70 80 90 100 110 

orf 53 - 1 . pep ILTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 

:|| MINIUM M MIMMMM 

orf 53ng- 1 KKS CVYLWVFL I LC I AS AT I NAGAVA I VTA 

10 20 30 



40 



120 130 140 150 160 170 

orf 53-1. pep AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAM 

I M M M M M M M M M M M M M M I M I M M M M M M I MM MM M M M 

orf 53ng-l AIVKMAIPSLMFDAGTVAALIMASCLI I LVSGRYRALDRVSKI I IVTLS I ATLAAAGIAM 

40 50 60 70 80 90 



45 



180 190 200 210 220 230 

orf 53-1 .pep SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIE I SAINSLWVTEKQR INPSEYRDGI 

MMM MM MIMMMMMIMI IMMMI IMMMI MMMM M 

orf 53ng-l SRGMQMQPDF I EPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG I 

100 110 120 130 140 150 



50 



240 250 260 270 280 290 

orf 53 - 1 . pep FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 

MMMMMIMMMMM IMMMMMMMM MMIMMMMMIMM 

orf53ng-l FDFNVGY I AS AVLALVFLALGAFVQYGNGEAVQMAGGKY I GQLINMYAVT IGGWSRPLVA 

160 170 180 190 200 210 
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300 310 320 330 340 350 

orf 53 - 1 . pep FIAFACMYGTTITVVDGYARAIAEPVRLLRGKDKTGNAEFFAVmiWVAGSGLAVIFWFDG 

1 1 1 1 1 1 1 1 M 1 1 Ml 1 1 1 M 1 1 1 1 M 1 1 hi 1 1 1 1 1 1 : 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 1 1 

or f 5 3 ng - 1 F I AFACMYGTT I TWDGYARAI AEPVRLLRGRDKTGNAELFAWN I WVAGSGLAVI FWFDG 

220 230 240 250 260 270 

360 370 380 390 400 410 

orf 53 - 1 . pep VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKi 

h M I I I , I I I I I I I h I I I h h I I II hhh hi I I I I hh hi hi hi II I 
or f 53ng- 1 AMAELLKFAMI AAFVSAPVFAWLNYRLVKGDKRHRLTAGMNALAI VGLLYLAGFAVLFLL 

280 290 300 310 320 330 



orf 53-1. pep NLAGMFKX 

lhh = 

orf53ng-l NLTGLLAX 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 58 

The following partial DNA sequence was identified in ^meningitidis (SEQ ID NO: 487): 

1 . . TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 

51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAATCCGAA CGCGAGTGGA 

101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 

151 CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 

201 GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 

251 TCCGTTTCTG CCGAAACTAT CTGGCGCACG AATCCGAACC GGACAGGCCC 

301 GTTCCGCCT . . 

This corresponds to the amino acid sequence (SEQ ID NO: 488; ORF58): 

1 . . LRETAYVLDS FDRYFWALA GLFFVRAQSE REWMREVSAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FH - AVKTAVYWLF VGWR FCRNY LAHESEPDRP 
101 VPP.. 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 489): 



1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 
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451 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

5 651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 

851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 

]0 901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

15 1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

13 01 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 
1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

20 14 01 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

14 51 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 
1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GATGAAGGGG CGTTCCCATC 
1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 
1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

25 1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

30 1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATC CGAAAA TGCTGGAATT 

35 2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

23 51 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 
40 24 01 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 

24 51 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 
2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 
2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 
2 601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

45 2 651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 

2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TTTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATT TTGAGCGGCG 

2 801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 

2 851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

50 2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC AACCGCGCCG 

2 951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3 001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 490; ORF58-1): 

55 1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGWR FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 
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251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 
301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 
351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWWEPPEV 
401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 
4 51 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 
501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 
551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 
601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 
651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 
701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 
751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLE KLPFI 
801 WWDEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 
851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 
901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 
951 DPMYDEAVSV VLKTRKAS I S GVQRALRIGY NRAARLIDQM EAEGIVSAPE 
1001 HNGNRTILVP LDNA* 

Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 (SEQ ID NO: 488) shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) 
(SEQ ID NO: 492) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf 58 . pep LRETAYVLDSFDRYFW ALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

: -I I I II I I I ' I II I M I Ml I I I I I I II MM I I I I M I I I I 
orf 58a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

10 20 30 40 50 

70 80 90 100 

orf 58 . pep FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPP 

I I I M I I I I I I I I I I I I II I I I I Ml I I I I I < I II M I I I I 
orf 58a FPELALM LFHAVKTAVYWLFVGWR FCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
60 70 80 90 100 110 

The complete length ORF58a nucleotide sequence (SEQ ID NO: 491) is: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAATCGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

401 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 
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751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

5 951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

10 1201 CCGAAAGTTC CCATGCCCGC AATNGATATT CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

13 01 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 
1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

14 01 TGTGGCAGAG CGGTCAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 
15 1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GATGAAGGGG CGTTCCAATC . 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACCGAAGA AGANCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

20 1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 

1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

25 1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2 051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

30 2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC -TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 

2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 

23 51 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 

24 01 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 
35 2451 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2 501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 

2 551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 

2 651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

40 2701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2 751 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

2801 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 

2 851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 

2 901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 
45 2 951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3 001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 492): 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

50 51 DGMPDFPELA LM LFHAVKTA VYWLFVGWR FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQGKG QAEAKSPDVS 

55 301 QGQSVSDGTA VRDAXRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTEXVSSVG YGXPVYDETA DIHIEEPAAP wDAWWEPPEV 

401 PKVPMPAXDI PPPPPVSEIY NRTYEPPAGF EQVQRSRIAE TDHLADDVLN 

451 GGWQEETAAI ANDGSEGVAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSRRAXDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP GATQTEEXLL 
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551 XNSITIEEKX 

601 LARSLGVAS I 

651 KLTLALGQDI 

701 APEDVRMIMI 

751 RYRLMSFMGV 

801 WWDEFADL 

851 LIKANIPTRI 

901 VHGAFASDEE 

951 DPMYDEAVSV 

1001 HNGNRTILVP 



AEFKVKVKW 
RWETILGKT 
TGQPWTDLG 
DPKMLELSIY 
RNLAGXNQKI 
MMTA GKKIEE 
AFQVSSKIDS 
VHRWEYLKQ 
VLKTRKAS I S 
XDNA* 



DSYSGPVITR 
CMGLELPNPK 
KAPHLLVAGT 
EGIPHLLAPV 
AEAAARGEKI 
LIARLAQKAR 
RTILDQMGAE 
FGEPDYVDDX 
GVQRALRIGY 



YEIEPDVGVR 
RQMIRLSEIF 
TGSGKSVGVN 
VTDMKLAANA 
GNPFSLTPDN 
AAGIHLILAT 
NLLGQGDMLF 
LSGGMSDDLL 
NRAARLIDQM 



GNSVLNLEKX 
NSPEFAESKS 
AMILSMLFKA 
LNWCVNEMEK 
PEPLX KLPFI 
QRPSVDVITG 
LPPGTAYPQR 
GISRSGDGET 
EAEGIVSAPE 



ORF58a (SEQ ID NO: 492) and ORF58-1 (SEQ ID NO: 490) show 96.6% identity in 1014 aa 
overlap: 



15 



10 20 30 40 50 60 

orf 58a . pep MFWIVLIVILLIALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

1 1 1 1 1 M M I IN 1 1 1 1 1 1 1 1 M II I II ; 1 1 1 1 1 1 1 1 1 1 1 1 1 M il 1 1 1 1 1 I ■ 1 1 

orf58-l MFWIVLIVILLXjALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

10 20 30 40 50 60 



20 



70 80 90 100 110 120 

orf 58a. pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M I 

orf 58-1 LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 



25 



130 140 150 160 170 180 

orf 58a . pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

1 1 1 1 1 1 M ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h I ! 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 i I i 1 1 ■ M 1 1 

orf 58-1 EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

130 140 150 160 170 180 



30 



190 200 210 220 230 240 

orf 58a . pep . EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

IMMM MIMMMMMIMM MIMMMMMMMMMMI MUM 

or f 5 8 - 1 EEATRALNS AALRETKKRY IDAFEKNETAVPKVRVSDTPMEGLQ I IGLDDPVLQRTYSHM 

190 200 210 220 230 240 



35 



250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 

MIMI MMMMMMMIIMM MMMIMMMMMMIMM IMMM 

orf 58 - 1 FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKS PDVS 

250 260 270 280 290 300 



40 



310 320 330 340 350 360 

orf 58a . pep QGQSVSDGTAVRDAXRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 

MMMMIIMM IMMIIIII MMMMMMMMMIIMM MMIIMI 

orf 58-1 QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 

310 320 330 340 350 360 



45 



370 380 390 400 410 420 

orf 58a. pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDIPPPPPVSEIY 

MIMMIMI MMMIMIMM MMMMMIIMM I II MINIMI 

orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
370 380 390 400 410 420 



50 



orf 58a . pep 



430 440 450 460 470 480 

NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 



CHIR-0160 (356.001) 



-367- 



PATENT 



II : I I I I I I M I 1 I 1 1 I 1 j I I I I I !: I I E 1 I : I I I I 1 II 

orf 58 - 1 NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

5 ' orf 58a. pep EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

1 1 1 1 1 1 1 1 ! 1 1 1 M 1 1 : 1 1 1 1 h MINIM 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

orf 58-1 EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

10 orf 58a. pep GATQTEEXLLXNSITIEEKXAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 

Mill II Mill I M M 1 1 1 I M M 1 1 1 M 1 1 M M 1 1 1 M M 1 1 M 1 1 1 1 

orf 58-1 EATQTEEELLENSITIEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

15 orf 58a. pep LARSLGVAS I RWET I LGKTCMGLELPNPKRQM I RLS E I FNS PEFAES KS KLTLALGQD I 

llllllllllllllll M 1 1 1 1 Ml M M 1 1 1 M M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 , 1 1 II II 

orf58-l LARS LGVAS I RWET I PGKTCMGLELPNPKRQM I RLSE I FNS PE FAES KS KLTLALGQD I 

610 620 630 640 650 660 

670 680 690 700 710 720 

20 orf 58a .pep TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

I MMMMMIMMIMMIMM IMMIMIMI IIMIIIIIMI MUM 

orf 58-1 TGQPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

670 " 680 690 700 710 720 

730 740 750 760 770 780 

25 orf 58a . pep EGIPHLLAPWTDMKIjAANALNWCVNEMEKJ^RLM 

IIIIIIIIIIIMI IMMIIIIIIIIII IIIIIMIIIIIMI III lllllll Ml 

orf 58-1 EG I PHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 

790 800 810 820 830 840 

30 orf 58a . pep GNPFSLTPDNPEPLXKLPFIVVVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

1 1 1 > 1 11 1 1 : 1 M I MIMMMMMMMIMM MMMIMMI MMMM 

orf 58-1 GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 

850 860 870 880 890 900 

35 orf 58a. pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

I II 1 1 M 1 1 1 1 1 1 1 M 1 1 1 i M M I 1 1 1 1 1 1 1 1 II i I Ml I M 1 1 1 II I llllllll 

orf 58-1 QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

850 860 870 880 890 900 

910 920 930 940 950 960 

40 orf 58a . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 

II II I M 1 1 1 1 1 1 1 1 1 1 M II II Ml I I II I MM IMIMI III MMMM 

orf 58-1 VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

910 920 930 940 950 960 

970 980 990 1000 1010 

45 orf 58a . pep VLKTRKAS I SGVQRALR I GYNRAARL I DQMEAEG I VS APEHNGNRT I LVPXDNAX 

MIMMIMI MMMMMMMMMIMM IMIMIMM 1 1 1 1 

orf 58-1 VLKTRKAS I SGVQRALR I GYNRAARL I DQMEAEG I VS APEHNGNRT I LVPLDNAX 

970 980 990 1000 1010 



Homology with a predicted ORF from N. gonorrhoeae 
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ORF58 (SEQ ID NO: 488) shows complete identity over a 9aa overlap with a predicted ORF 
(ORF58ng) (SEQ ID NO: 494) from N. gonorrhoeae: 



orf 58 .pep ALMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPP 103 

MINIMI 

orf58ng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence (SEQ ID NO: 493) is predicted to encode a protein having 
partial amino acid sequence (SEQ ID NO: 494): 



1 . . SEPDRPVPPA SANRADVPTA SDGYSDSGNG TEEAETEAAE AAEEEAADTE 

51 DIATAVIDNR RIPFDRSIAE GLMQSESKTS PVRPVFKEIT LEEATRALSS 

101 AALRETKKRY IDAFEKNGTA VPKVRVSDTP MEGLQIIGLD DPVLQRTYSR 

151 MFDADKEAFS ESADYGFEPY FEKQHPSAFS AVKAENARNA PFRRHAGQEK 

201 GQAEAKSPDV SQGQSVSDGT AVRDARRRVS VNLKEPNKAT VSAEARISRL 

251 IPESRTWGK RDVEMPSETE NVFTETVSSV GYGGPVYDEA ADIHIEEPAA 

301 PDAWWEPPE VPEVAVPEID ILPPPPVSEI YNRTYEPPAG FEQAQRSRIA 

351 ETDHLAADVL NGGWQEETAA IADDGSEGAA ERSSGQYLSE TEAFGHDSQA 

4 01 VCPFEDVPSE RPSCRVSDTE ADEGAFQSEE TGAVSEHLPT TDLLLPPLFN 

451 PEATQTEEEL LENSITIEEK LAEFKVKVKV VDSYSGPVIT RYEIEPDVGV 

501 RGNSVLNLEK DLARSLGVAS IRWETIPGK TCMGLELPNP KRQMIRLSEI 

551 FNSPEFAESK SKLTLALGQD ITGQPWTDL GKAPHLLV AG TTGSGKS VGV 

601 NAMILSMLFK AAPEDVRMIM IDPKMLELSI YEGITHLLAP WTDMKLAAN 

651 ALNWCWEME KRYRLMSFMG VRNLAGFNQK IAEAAARGEK IGNPFSLTPD 

701 DPEPLE KLPF IWWDEFAD LMMTA GKKIE ELIARLAQKA RAAGIHLILA 

751 TQRPSVDVIT GLIKANIPTR IAFQVSSKID SRTILDQMGA ENLLGQGDML 

801 FLPPGTAYPQ RVHGAFASDE EVHRWEYLK QFGEPDYVDD ILSGGGSEEL 

851 PGIGRSGDGE TDPMYDEAVS WLKTRKAS I SGVQRALRIG YNRAARLIDQ 

901 MEAEGIVSAP EHNGNRTILV PLDNA* 

This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. colL Alignment of ORF58ng (SEQ ID NO: 
494) and FtsK (accession number p46889) (SEQ ID NO: 1 142) show a 65 % amino acid identity in 
459 overlap: 



0RF58ng: 467 IEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRVVET 526 

+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 

FtsK: 868 VEARLADFRIKADVVNYSPGPVITRFELNI^PGVKAARISNLSRDLARSLSTVAWWE 927 

ORF58ng: 527 IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 586 

IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 

FtsK: 928 IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 987 



ORF58ng: 587 LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 646 

LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
FtsK: 988 LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRF IM IDPKMLELS VYEGI PHLLTEWTDMK 104 7 



ORF58ng: 
FtsK: 



64 7 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP- - 704 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

1048 DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 
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ORF58ng: 705 - -LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 762 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
FtsK: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

ORF58ng : 763 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 822 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
FtsK: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 



ORF58ng : 823 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
FtsK: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

ORF58ng: 883 VQRALR IGYNRAARL I DQMEAEGI VS APEHNGNRT I LVP 921 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
FtsK: 1287 VQRQ FR I G YNRAAR 1 1 EQME AQG I VS EQGHNGNRE VLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be (SEQ ID NO: 
495): . 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 



ATGTTTTGGA 
GTTTTTTGTC 
CGTGGCAGGA 
GACGGTATGC 
CAAAACGGCA 
ACTATCTGGC 
GCAAACCGTG 
AAACGGGACG 
AGGCTGCCgA 
ATCCcatTCG 
AACTTCGCCC 
CGCGTGCTTT 
GATGCATTTG 
TACCCCGATG 
AACGCACGTA 
TCTGCGGATT 
CTTTTCTGCA 
ATGCAGGGCA 
CAAGGGCAGT 
CGTTTCCGTC 
CGCGGATTTC 
GATGTCGAAA 
GTCTGTGGGA 
TTGAAGAGCC 
CCGGAGGTAG 
GGAAATCTAC 
AACGCAGCCG 
GGAGGTTGGC 
TGCGGCAGAG 
GGCATGACAG 
CCGTCCTGCC 
GGAAGAGACC 
TGCCTCCGCT 
GAAAACAGCA 
CAAGGTTGTC 
AACCCGATGT 
TTGGCGCGTT 
CGGCAAAACC 
TACGCCTGAG 
AAGCTGACGC 



TAGTTTTGAT 
CGCGCACAAT 
AAAGAAAGGG 
CCGATTTTCC 
GTGTATTGGC 
GCACGAATCC 
CGGATGTTCC 
GAAGAAGCGG 
TACgGAAGAC 
ACCGGAGTAT 
GTCCGTCCGG 
AAGCAGCGCG 
AGAAAAACGG 
GAAGGGCTGC 
TTCCCGTATG 
ACGGATTTGA 
GTCAAAGCCG 
GGAGAAAGGG 
CCGTTTCAGA 
AATTTGAAAG 
GCGCCTGATT 
TGCCGTCTGA 
TACGGCGGTC 
TGCCGCGCCC 
CCGTACCCGA 
AACCGTACCT 
CATTGCCGAA 
AGGAGGAAAC 
CGGTCAAGCG 
TCAGGCGGTT 
GGGTATCGGA 
GGTGCGGTAT 
GTTCAATCCC 
TCACCATCGA 
GATTCTTATT 
CGGCGTGCGC 
CGCTCGGCGT 
TGCATGGGTT 
CGAAATTTTC 
TCGCGCTCGG 



CGTTATtgtg 
CCGAACGCGA 
GAAAAACAGG 
CGAGTTTTCC 
TGTTTGTCGG 
GAACCGGACA 
GACCGCATCC 
AAACGGAAGC 
ATTGCAACTG 
TGCTGAAGGG 
TTTTTAAGGA 
GCTTTAAGGG 
AACAGCCGTC 
AGATTATCGG 
TTTGATGCGG 
GCCGTATTTT 
AAAATGCACG 
CAGGCGGAGG 
CGGCACAGCC 
AACCGAACAA 
CCGGAAAGTC 
AACCGAAAAT 
CGGTTTATGA 
GATGCTTGGG 
AATCGATATT 
ATGAGCCGCC 
ACCGACCATC 
CGCCGCTATT 
GGCAATATCT 
TGTCCGTTTG 
TACGGAAGCG 
CCGAACACCT 
GAGGCGACGC 
AGAAAAATTG 
CCGGCCCCGT 
GGCAATTCCG 
GGCTTCCATC 
TGGAACTTCC 
AATTCGCCCG 
TCAGGACATT 



TTGCTTGCGC 
GTGGATGCGC 
CGGAGCTGCC 
CTGATGCTTT 
TGTCGTCCGT 
GGCCCGTTCC 
GACGGGTATT 
AGCAGAAGCT 
CCGTAATCGA 
TTGATGCAGT 
AATCACTTTG 
AAACGAAAAA 
CCCAAAGTAC 
TTTGGACGAC 
ACAAAGAAGC 
GAGAAGCAGC 
GAATGCGCCG 
CAAAATCCCC 
GTCCGCGATG 
GGCAACGGTT 
GGACGGTTGT 
GTTTTCACGG 
TGAAGCTGCC 
TGGTCGAACC 
CTGCCGCCGC 
GGCAGGATTC 
TTGCCGCTGA 
GCAGATGACG 
GTCGGAAACC 
AAGATGTGCC 
GATGAAGGGG 
GCCGACAACC 
AAACCGAAGA 
GCGGAGTTCA 
GATTACGCGT 
TTCTGAATTT 
CGCGTTGTCG 
GAACCCGAAA 
AGTTTGCCGA 
ACCGGACAGC 



TTGCCGGCCT 
GAGGTTTCTG 
TGAAATCAAA 
TCCATGCCGT 
TTCTGCCGAA 
GCCTGCTTCT 
CAGACAGTGG 
GCGGAGGAAG 
CAACCGCCGC 
CTGAAAGCAA 
GAAGAAGCAA 
ACGCTATATC 
GCGTGTCCGA 
CCTGTGCTTC 
GTTTTCCGAG 
ATCCGTCTGC 
TTCCGCCGTC 
GGATGTTTCC 
CCCGCCGCCG 
TCTGCGGAGG 
CGGGAAACGG 
AAACCGTTTC 
GATATCCATA 
ACCCGAAGTG 
CTCCCGTATC 
GAGCAGGCGC 
TGTTTTGAAT 
GCAGTGAGGG 
GAAGCGTTCG 
GTCTGAACGC 
CGTTCCAATC 
GACCTGCTTC 
AGAACTGTTG 
AAGTCAAGGT 
TATGAAATCG 
GGAAAAAGAC 
AAACCATCCC 
CGCCAAATGA 
ATCCAAATCC 
CCGTCGTAAC 



CHIR-0160 (356.001) 



-370- 



PATENT 



2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

5 2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

10 2451 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

15 2 701 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2 751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2 801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2 851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

. 2 901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

20 2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 496; ORF58ng-l): 

1 MFWIVLIVIV LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

25 51 DGMPDFPEFS LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

201 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

30 301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

401 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 

4 51 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 

501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

35 551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMS FMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLE KLPFI 

40 801 VWVDEFADL MMTA GKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRWEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRT I LVP LDNA* 
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ORF58ng-l (SEQ ID NO: 496) and ORF58-1 (SEQ ID NO: 490) show 97.2% identity in 1014 aa 
overlap: 



10 20 30 40 50 60 

orf 58-1 .pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 
II I I I II |: II II I I Ml II I I . M I I I I II I I II II I I II I I II II I M Ml I I - 
orf 58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPEFS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 58 - 1 . pep LMLFHAVKTA VYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
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IIIIMIIIIIIIIIIIIIMIII MIMIII Illlllll IIIIMIIIIIIIII 

orf 58ng-l LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

5 orf 58-1 .pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

Mill I M I I i I i I I I I I I II I I I I I M I I I I I I I I II llh MIMMIMM 
orf58ng-l EEAETEAAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMQSESKTS PVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

1 0 or f 58 - 1 . pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQI IGLDDPVLQRTYSHM 

M I II I M 1 1 1 1 1 1 II 1 1 1 1 1 II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 Mil 1 1 hi 

orf 58ng-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 

190 200 210 220 230 240 

250 260 270 280 290 300 

15 orf 58-1 .pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

MUM MM III IMIIIMM MM MMIMMMMIIII II II II MM I 

orf 58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 58 - 1 . pep QGQS VSDGTAVRDARRRVSVNLKEPNKATVSAEARI SRL I PESQTWGKRDVEMPSETEN 

1 1 M I II 1 1 1 1 1 1 M II I II II II II 1 1 M 1 1 1 1 1 1 1 M 1 1 M M I II II M I II 1 1 II 

orf 58ng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 

310 320 330 340 350 360 
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370 380 390 400 410 420 

orf 58-1 .pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 

i [ 1 1 1 1 1 1 1 1 1 e 1 1 r 1 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i = i in mm ii ii i 

orf 58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 

370 380 390 400 410 420 



30 



430 440 450 460 470 480 

orf 58 - 1 . pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

I I I I I I I : I I I I : I I I I I I I I I I I I IMIIMIIIIIII IIIIIMI IIMIII 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 
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490 500 510 520 530 540 

orf 58-1 .pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 i 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 MM MMMMMMMMM 

orf 58ng-l EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 
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550 560 570 580 590 600 

orf 58- 1 . pep EATQTEEELLENSITIEEKLAEFKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

II I M 1 1 II 1 1 1 II 1 1 1 M I II M 1 1 II 1 1 1 1 1 1 II 1 1 M 1 1 1 1 II 1 1 II I II III 1 1 

orf 58ng-l EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 
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610 620 630 640 650 660 

orf 58-1 .pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

Ml II I II MM I MM II II Mill II II II I II MM III II I III I II II II Mill 

orf 58ng-l LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 
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670 680 690 700 710 720 

orf 58-1. pep TGQP WTDLGKAPHLLVAGTTGSGKS VGVNAM I LSMLFKAAPEDVRM IM IDPKMLELS I Y 

1 1 1 1 1 1 1 1 ] 1 1 1 i !, 1 1 1 1 M M 1 1 1 1 1 II ill I M 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 M 1 1 

orf 58ng-l TGQP WTDLGKAPHLLVAGTTGSGKS VGVNAM I LSMLFKAAPEDVRM IM IDPKMLELS I Y 
670 680 690 700 710 720 



10 



730 740 750 760 770 780 

orf 58-1 .pep EGIPHLl^PWTDMKLAANALKWCWEMEKRYRLM^ 

III Ml 1 1 1 1 1 1 1 1 1 i I i 1 1 1 1 1 1 1 ! 1 1 1 Ml 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 

orf 58ng-l EGITHLLAPVOTDMKLAANALNWCWEMEKRYRLMSFMGVRNL 

730 740 750 760 770 780 



15 



790 800 810 820 830 840 

orf 58-1 .pep GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 58ng-l GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

790 800 810 820 830 840 



20 



850 860 870 880 890 900 

orf 58-1. pep QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 

IIIMIIIIIIMMI IMMIIIII MIIIMMIIIIIIMIIIMI Mill 

orf 58ng-l QRPSVDVITGLI KANI PTRI AFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 



25 



910 920 930 940 950 960 

orf 58 - 1 . pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 

I II II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II M I II II 1 1 1 II 1 1 1 1 II 1 1 IMIMMMM 

orf 58ng-l VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 



970 980 990 1000 1010 

orf 58 - 1 . pep VLKTRKAS I SGVQRALR IGYNRAARL I DQMEAEG I VS APEHNGNRT I LVPLDNAX 

M 1 1 M II 1 1 1 1 1 M M I M M 1 1 1 1 1 1 M I II M M 1 1 M 1 1 1 M M 1 1 1 M M 

orf58ng-l VLKTRKAS ISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTI LVPLDNAX 
30 970 980 990 1000 1010 

Furthermore, ORF58ng-l (SEQ ID NO: 496) shows significant homology to the Kcoli protein 
FtsK (SEQ ID NO: 1142): 



35 



40 



45 



sp|P4688 9|FTSK_ECOLI CELL DIVISION PROTEIN FTSK ) gi | 1651412 | gnl | PID | dl015290 (Dl 
division protein FtsK [Escherichia coli] ) gi | 1651418 | gnl | PID | dl015296 (D90727) Cell 
division protein FtsK [Escherichia coli] )gi| 1787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
Score = 576 bits (1469) , Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%), Gaps = 5/459 (1%) 

I EEKLAEFKVKVKWDS YSGPVITRYE I EPDVGVRGNS VLNLEKDLARSLGVAS IRWET 615 
+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
VEARLADFRIKADVWYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRVVEV 92 7 

I PGKTCMGLELPNPKRQMIRLSE I FNS PEFAES KS KLTLALGQD I TGQP WTDLGKAPHL 675 
IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
I PGKP YVGLELPNKKRQTVYLREVLDNAKFRDNPS PLTWLGKD I AGEP WADLAKMPHL 987 

LVAGTTGSGKSVGVNAMILSMLFKAAPEDVT^IMIDPKMLELSIYEGITHLLAPVVTDMK 735 
LVAGTTGSGKS VGVNAM I LSML+ KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
LVAGTTGSGKS VGVNAM I LSMLYKAQPEDVRF I M I DPKMLELS VYEGI PHLLTE WTDMK 104 7 



Query : 


556 


Sbjct : 


868 


Query: 


616 


Sbjct : 


928 


Query: 


676 


Sbjct : 


988 
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Query: 736 LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP- - 793 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

Sbjct: 1048 DAANALRWCVNEMERRYKLMSALGV^LAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

Query: 794 - -LEKiPFIVVVVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 851 

L+K P+ I W+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
Sbjct: 1108 PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

Query : 852 IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 911 

IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
Sbjct: 1168 IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

Query : 912 HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 971 

H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
Sbjct: 1228 HAWQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 1286 

Query: 972 VQRALR I G YNRAARL I DQMEAEG I VS APEHNGNRT I LVP 1010 

VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 
Sbjct: 1287 VQRQ FR I G YNRAAR 1 1 EQMEAQG I VSEQGHNGNRE VLAP 1325 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 59 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 497): 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC . . GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence (SEQ ID NO: 498; ORF101): 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GXVIAIDAVL 

51 ALVGFWV : 

// 

301 . . . IAIGLFL IYQNGLTLLF EAVEDGKIHF WLGLLPMHII MFVLALILLR 
3 51 VRSMPSQPFW QAVGKSLTLK GGK* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 499): 



1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

2 01 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 



CHIR-0 160 (356.001) 



-374- 



PATENT 



251 


AAATGTCGGT 


CTGGCTATCC 


301 


CCGGTGATGC 


AGTTTGCCGT 


351 


GCTTTGGGTG 


ATACCGTGGG 


401 


TCCTGAAGCA 


GAAGCAGGAA 


451 


AGTTTGGGCA 


AGCGCAACGG 


501 


CGAATCCGGC 


ATCATGAAAA 


551 


GCGGCGACAA 


CATCATCTTC 


601 


AACAAACGCA 


CGCTCGAATT 


651 


CGGACGCGCC 


GACTACAATC 


701 


TCAGCACCAC 


GCCCAAACTC 


751 


CCGACCGCCC 


AACTGATTGG 


801 


GATGTGGCGC 


ATCTCGCTGA 


851 


CCGTGCCGCT 


TTCCTATTTC 


901 


TTGATTGCCA 


TCGGTTTGTT 


951 


TTTTGAAGCC 


GTGGAAGACG 


1001 


CTATGCACAT 


TATCATGTTT 


1051 


AGTATGCCCA 


GCCAGCCCTT 


1101 


GAAAGGCGGA 


AAATGA 



TGCGGATTGG CATTGAAACA ATGGATACGC 
GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 
CAGAGCTACG GAGCCGCGAA TACGCTGAAA 
TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 
CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 
ACCTGTTCCT GCGCGAACAG GACAAAAACG 
GCCAAAGAAG GTAACTTCTC GCTGAACGAC 
GCGCCACGGC TACCGTTACA GCGGCACGCC 
AGGTTTCCTT CCAAAAACTC AACCTGATTA 
ATCGACCCCG TTTCCCACCG CCGTACCATT 
CAGCAGCAAC CCGCAACATC AGGCGGAATT 
CCGTCAGCGT CCTCCTACTC' TGCCTGCTTG 
AACCCGCGCA GCGGACATAC CTACAATATC 
TTTAATTTAC CAAAACGGGC TGACCCTGCT 
GCAAAATCCA TTTTTGGCTC GGACTGCTGC 
GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 
CTGGCAGGCG GTTGGCAAAA GTCTGACATT 



This corresponds to the amino acid sequence (SEQ ID NO: 500; ORF101-1): 

MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GRVAIDAVLA 
LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 
PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 
SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 
NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 
PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 
LIAIGLFLIY QNGLTLLFEA VEDGKIHFWL GLLPMHIIMF AVALILLRVR 
SMPSQPFWQA VGKSLTLKGG K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N '.meningitidis (strain A) 



ORF101 (SEQ ID NO: 498) shows 91.2% identity over a 57aa overlap and 95.7% identity over 
69aa overlap with an ORF (ORFlOla) (SEQ ID NO: 502) from strain A of N. meningitidis: 



i 

51 
101 
151 
201 
251 
301 
351 



10 20 30 40 50 

orf 101 . pep M I YQRNL I KE LS FT AVG I F WLLAVLVS TQ A I NLLGRAADGXV I A I DAVLALVGFWVX 

1 1 1 1 ; 1 1 1 1 1 ; 1 1 j ; 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 i ! [ mi 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 

orf 101a M I YQRNL I KELS FTAVG I F WLLAVLVS TQA I NLLGXAADXRX - AI DAVLALVGFWVXXM 

10 20 30 40 50 

// 

90 100 110 

orf 101 .pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

IIIIIIIIIIMIMMIIIMIIIIIIII 

orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 



120 130 140 150 

orf 101 .pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 

I I I M M : I - M I I I I I I I I I I I I Ml I I I I M M 
orf 101a LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 
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The complete length ORF101 a nucleotide sequence (SEQ ID NO: 501) is: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCN TGCCGCCGAC NGGCGTNTCG CCATCGATGC CGTGTTGGCA 

5 151 TTGGTCGGCT TCTGGGTCNN NNGNATGACG CCGCTTTTGC TNGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGNGACAGCG 

251 AAATGTCGGT CTGGNTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

10 4 01 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGGGTTCAAC 

4 51 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC NCCAAAGAAA GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

15 651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCNAAAACTC AACCTGATTA 

701 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACNATN 

751 CCNACNGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC ANGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

20 901 TTGANTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

1051 AGCATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

25 This encodes a protein having amino acid sequence (SEQ ID NO: 502): 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGXAAD XRXAIDAVLA 

51 LVGFWVXXMT PLLLV LTAF I STLTVLTRYW RDSEMSVWXS CGLALKQW I R 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

30 201 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 

251 PTAQLIGSSN PQHXAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LXAIGLFLIY QNGLTLLFEA VEDGKIHFWL GLLPMHI IMF VIAIVLLRVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

35 ORFlOla (SEQ ID NO: 502) and ORF101-1 (SEQ ID NO: 500) show 95.4% identity in 371 aa 
overlap: 

orf 101a .pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 60 

MINI II I MM IIIIIINI lllllll Mill II III I IIIIIMIIIIII II 

orf 101- 1 MI YQRNLI KELS FTAVG I FWLLAVLVSTQAINLLGRAADGRVAI DAVLALVGFWVI GMT 60 

40 orf 101a .pep PLLLVLTAF IS TLTVLTR YWRDS EMS VWXS CGLALKQW I RPVMQFAVP FAVLVAVMQLWV 120 

Illlllllllllllllllllllllllll lllllll III lllllll IMMMIMlll 

orfl01-l PLLLVLTAF I S TLTVLTR YWRDS EMS VWLS CGLALKQW I R P VMQ FAVP FAVLVAVMQLWV 12 0 

orf 101a .pep IPWAELRS RE YAE I LKQKQELSLVEAGGFNSLGKRNGRVY FVETFDTESG IMKNLFLREQ 180 

1 1 M 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 I Ml M 1 1 1 1 1 1 1 1 1 1 1 1 . 1 1 1 II 1 1 1 1 1 1 1 

45 orfl01-l IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 

orf 101a .pep DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 240 

llllllllll I M 1 1 ! 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 I 1 1 I I I I I 1 1 I lllllll MM 

orfl01-l DKNGGDNI I FAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVS FQKLNL IISTTPKL 240 



orf 101a .pep IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 
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Mlllllll IIIIIIIMIMI I 1 1 1 1 II I M II 1 1 1 M ! 1 1 1 1 1 1 1 1 . 1 1 1 M 1 1 

orf 101-1 IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

orf 10 la .pep LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 360 

I M I I I I I I I I I I M I II I I I I I , I I I I II I I I I I I l-l -I I I I I I I I I I I I I I 
5 orf 101-1 LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 3 60 

orf 101a. pep VGKSLTLKGGK 3 71 

MIMIIMM 
orfl01-l VGKSLTLKGGK 371 

Homology with a predicted ORF from N.gonorrhoeae 

10 ORF101 (SEQ ID NO: 498) shows 96.5 % identity in 57aa overlap at the N-terminal domain and 
95.1% identity in 61 aa overlap at the C-terminal domain, respectively, with a predicted ORF 
(ORFlOlng) (SEQ ID NO: 504) from N. gonorrhoeae: 

orf 101 . pep M I YQRNL I KE LS FTAVG I F WLLAVL VS TQ A I NLLGRAADGX V I A I DAVLAL VGFWV 57 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i 1 I i I I I I I 
] 5 orf lOlng M I YQRNL I KE LS FTAVG I F WLLAVLVS TQ A I NLLGRAADGRV - A I D AVLALVG FWV I GM 59 

// 

orf 101 .pep I A I GLFL I YQNGLTLLFE AVEDGKI HFWLG 333 

I I I M I II I II II I I I I I I I II II II I I I I 
orf lOlng SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

20 orf 101 .pep LLPMHI IMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 

I I I I I I I :|- I I I I' I M I I I I M I I 
orflOlng LLPMHI IMFVIAIVLLRVRSMPSQPFWQAVG 362 

The ORFlOlng nucleotide sequence (SEQ ID NO: 503) is predicted to encode a protein having 
25 partial amino acid sequence (SEQ ID NO: 504): 

1 MIYORNLIKE LSFTAVGIFV V LLAVLVSTQ AINLLGRAAD GRVAIDA VLA 

51 LVGFWVIGMT PLLLV LTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

30 201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

2 51 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYN_I 
301 LIAIGLFLIY QNGLTLLFEA VEDGKIHFWL GLLPMHIIMF VIAIVLLRVR 

3 51 SMPSQPFWQA VG . . . 

35 Further work revealed the complete nucleotide sequence (SEQ ID NO: 505): 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTGTTGGT GTCCACGCAG GCGATCAACC 

101 TGCTTGGCCG CGCAGCTGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCC 

151 TTAGTCGGCT TCTGGGTCAT CGGTATGACC CCGCTTTTGC TGGTGTTGAC 

40 201 CGCATTCATC AGCACGCTGA CCGTATTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CGTTGAAACA GTGGATACGC 

301 CCCGTCATGC AGTTTGCCGT GCCGTTTGCC ATCCTGATTG CCGTCATGCA 
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351 GCTTTGGGTG ATACCGTGGG CAGAGCTGCG CAGCCGCGAA TATGCCGAAA 

401 TTTTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAAGCCGG CGAGTTCAAT 

451 AACTTGGGCA AGCGCAACGG CAgggtttaT TtcgtcgaaA CCTTTGACAC 

501 CGaatCcgGC ATCATGAAAA ACCTGTtCCt GcGCGAACAG GACAAAAACG 

5 551 gcggcgacaA CATCATCTTC GCcaaaGAag gtaactTctc gctgaaggaC 

601 AACAAAcgca cgctcgaATT GCGCCACGGC TACCGTTACA GCGGcacgcC 

651 CGGacGCGCc gactaCAATC AGGTTtcctt cCAAAAacTc aacctgATta 

701 TCAGCACCAC GCCCAAacTT ATCGaccCCG TTTCCCACCG CCGCACCATT 

751 tcgacCGCCC AAcTGATTGG CAGCAGCAAT CCGCAACATC AGGCAGAATT 

10 801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTGCTC TGCCTACTCG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

15 1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGgcgGA AAATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 506; ORF101ng-l): 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ AINLLGRAAD GRVAIDAVLA 

20 51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

25 301 LIAIGLFLIY QNGLTLLFEA VEDGKIHFWL GLLPMHIIMF VIAIVLLRVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORF101ng-l (SEQ ID NO: 506) and ORF101-1 (SEQ ID NO: 500) show 97.6% identity in 371 aa 
overlap: 

30 10 20 30 40 50 60 

orf 101- 1 . pep M I YQRNL I KELSFTAVG I FWLLiAVLVSTQAINLLGRAADGRVAI DAVLALVGFWVI GMT 

! 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M 1 1 1 1 M 1 1 II 1 1 1 1 

orf 101ng-l M I YQRNL I KELSFTAVG I F WLLAVLVSTQAINLLGRAADGRVAI DAVLALVGFWVIGMT 

10 20 30 40 50 60 

35 70 80 90 100 110 120 

orf 101 - 1 . pep PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 

E I I I I I ! I j I I ! I I I 1 I I I I I I I I I t I I I I I I I I I I I I I 1 I : I : I I I I I t I 

orf 101ng-l PLLLVLTAF I STLTVLTRYWRDS EMS VWLS CGLALKQWI RPVMQFAVP FAI L I AVMQLWV 

70 80 90 100 110 120 

40 130 140 150 160 170 180 

orf 10 1 - 1 . pep I PWAELRSREYAE I LKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 

I I I I I I I I I I I I I I I I I I I I I I I I I II II U I I I I I I I I • M M I I II I I I I I I I I I 
orf 1 0 Ing- 1 I PWAELRSREYAE I LKQKQELSLVEAGE FNNLGKRNGRVY FVETFDTESG IMKNLFLREQ 

130 140 150 160 170 180 

45 190 200 210 220 230 240 

orf 101-1 .pep DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I II I I I I 
orf lOlng- 1 DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVS FQKLNLIISTTPKL 

190 200 210 220 230 240 



50 



orf 101-1 .pep 



250 260 270 280 290 300 

IDPVSHRRTI PTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
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Illlllllll MIIIMIIIIIMIIMIIIIIIMIIII III IIIIIIIIIIMIIM 

orf lOlng- 1 IDPVSHRRT I STAQL I GSSNPQHQAELMWR I SLTVSVLLLCLLAVPLSYFNPRSGHTYNI 

250 260 270 280 290 300 

310 320 330 340 350 360 

5 orf 101-1 .pep LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 

I I i I I I I II I I M I I II I I I I I I I I I I I II I II I I I I I h:h:| I I I I M I I II I I I ' 
orf 101ng-l LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

10 orf 101-1. pep VGKSLTLKGGKX 

IMIIIIIIIM 

orf 1 0 lng - 1 VGKSLTLKGGKX 

370 

15 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

20 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 507): 

1 . . GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

25 201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 
251 ' CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

3 01 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT . GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 



30 



35 



This corresponds to the amino acid sequence (SEQ ID NO: 508; ORF1 1 3): 

1 . . GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 
51 ILSYHSKIDA PVWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 
101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with with pspA putative secreted protein (SEP ID NO: 1143) of N .meningitidis 
(accession AF030941) 



ORF (SEQ ID NO: 508) and pspA (SEQ ID NO: 1143) show 44% aa identity in 179aa overlap: 
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GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 
GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS + + I+A 
GGGLINAASVTLTSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

P VWGQDVRWAGQNDVAATGDAHS P I LXXXXXXXXXXXXXXGTH IPLFAI DTGKLGGM YA 120 

VWG+DV+W+G+N + G + P AIDT LGGMYA 

GVWGKDVKWSGKNKLDFDG SLAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

NKITLIS TVEQAG I RNQGQWFAS AGNVAVNAEGKL VNTGM I AATGENHAVS LHARNVHN 179 
+KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A + + + A+ V N 

DKI TL I S TDNGAVI RNKGR I FAATGGVTLS ADGKLSNS GS I DAA EITI S AQTVDN 362 

10 Homology with a predicted ORF from N. gonorrhoeae 

ORF113 (SEQ ID NO: 508) shows 86.5% identity in 52aa overlap at the N- terminal part and 
94.1% identity in 17aa overlap at the C-terminal part with a predicted ORF (ORF1 13ng) (SEQ ID 
NO: 510) from N. gonorrhoeae: 



orf 113 
pspa 
orf 113 

5 

pspa 
orf 113 
pspa 



orf 113 GGGFINASCATLTTAKPQYQAGDLSAFKIR 3 0 

15 M I II I I I I I I I I -I I M I I I : M I I I 

orf 113ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

or f 1 1 3 QGNWI AGHGLDARDTDYTRI LS YHS KI DAP VWGQDVRWAGQNDVAATGDAHS P I LNNA 9 0 

Ml: MINI IMIHIII 

or f 1 1 3 ng QGNAVI AGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

20 orfll3 IDTGKLGGXVCQQNHLDQYGRASRHS 135 

llllll IIIM II 

orf 1 1 3 ng DFSGFKIRQGNAVI AGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 263 

The complete length ORF1 13ng nucleotide sequence (SEQ ID NO: 509) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 510): 

25 1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 

51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 

101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 

151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 

2 01 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 

30 251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 61 

35 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 511): 



1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 
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51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

2 01 ATACATTATC AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 
251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

3 01 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 
351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 
401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 
451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 
501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 
551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 
601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 
651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 
701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTATCAATAC CGATACGCTA 
751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 
801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 
851 TGCTCAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 
901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 
951 TATCACAGGC AAAGAAAAAG GTGTTT . . 

This corresponds to the amino acid sequence (SEQ ID NO: 512; ORF1 15): 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KS AVTATQD I NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV. . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein (SEP ID NO: 1143) of N.meningitidis 
(accession number AF030941) 

ORF115 (SEQ ID NO: 512) and pspA protein (SEQ ID NO: 1143) show 50% aa identity in 325aa 
overlap: 



OrfllS: 1 S TGHS EQNYTLPRE I TRN I SLGSFAYESHRKALSHHAPSQGTELPQSNG I SLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
pspA: 778 STGYSRSPYEPAPEVS-SIRMGISAYKGYAPQQASDIPGTWPWAENGIHPTFT 831 

OrfllS : 61 PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN+ +HKRLGDGYYEQ+ 
pspA: 832 -LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

OrfllS : 121 LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 

L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
pspA : 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 

OrfllS : 181 WLVQKE VKLPDGGTQTVLVPQVYVRVKNGD I DGKGALLSGSNTQ INVSGSLKN - SGT I AG 239 

WL + V LPDG TQTVL P+VYVR + D++G+GALLSGS I SG+++N G I AG 
pspA : 951 WLENETVTLPDGTTQTVLKPKVYVRARPKDMNGQGALLSGS WDI G - SGAI ENRGGL I AG 1009 
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Orf 115: 


240 


RNAL I INTDTLDN I GGRI HAQKS AVTATQD INN I GGMLS AEQTLLLNAGXXXXXXXXXXX 


299 






R ALI+N + N+ G + + A DI N G + AE LLL A 




pspA: 


1010 


REALILNAQNIKNLQGDLQGKNIFAAAGSDITNTGS- IGAENALLLKASNNIESRSETRS 


1068 


Orf 115: 


300 


XXXXXXXXXYLDRMAGIYITGKEKG 324 








+ R+AGIY+TG++ G 




pspA: 


1069 


NQNEQGS VRN I GRVAG I YLTGRQNG 1093 





Homology with a predicted ORF from ^gonorrhoeae 

ORF115 (SEQ ID NO: 512) shows 91.9% identity over a 334aa overlap with a predicted ORF 
(ORF115ng) (SEQ ID NO:, 5 14) from N. gonorrhoeae: 



10 orf 115. pep * STGHSEQNYTLPREITRNISLGSFAYESHRK 31 

III IMIIMMMI IIIIIU I 

or f 1 1 5 ng NEQTFGE KKVFS ENGKLHNYWRARRKGHDETGHREQNYTL PE EITRDIS LGS FAYESHS K 71 

orf 115 .pep ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

IIMIIII llllll I 1 I I I M I II I I h I I I I M M M II I i 1 I I 

15 orf 115ng ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

orf 115 . pep DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

IIIIIIMIIIIMIII Mill II IIMII I Mil III llllll IIIIMIIII I' III 

orf 115ng DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

orf 115. pep EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDI VWLVQKEVKLPDGGTQTVLVPQ 201 

20 | | | | M I I I I I I I I I I I I I I II I I I II I I I = I I I I I I I M I I I II I I I I I I I I I I M : I I 

orf 1 1 5ng EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDI VWLVQKEVKLPDGGTQTVLMPQ 251 

orf 115 .pep VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 261 

IIMIIII I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I II I II M I II IM I I 
orf 115ng VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALI INTDTLDN I GGR I HAQK 311 

25 orf 115 .pep SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

1 1 I II 1 1 1 1 1 1 1 M 1 1 1 1 II II 1 1 1 1 1 1 M 1 1 ^ 1 i 1 1 ^ 1 1 1 1 1 1 ■ 1 1 1 1 1 1 : 1 I ! 

orf 115ng SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 

orf 115. pep EKGV 325 
I I I I 

30 orf 115ng EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 431 

An ORF115ng nucleotide sequence (SEQ ID NO: 513) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 514): 



1 MLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

35 51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

40 301 DNIGGRIHAQ KS AVTATQD I NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD AS KHTGRSGG 
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501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 

Further work revealed the following partial gonococcal DNA sequence (SEQ ID NO: 515): 



1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

2 01 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

4 51 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 

651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 

751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 

801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 

901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 

951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

14 01 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 516; ORF1 15ng-l): 



1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 
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2 01 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

5 4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

10 651 NQLNSKTTQT YEQKGLTVAF SS PVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 



This gonococcal protein (ORF115ng-l) (SEQ ID NO: 516) shows 91.9% identity with ORF115 
(SEQ ID NO: 512) over 334aa: 



15 20 30 40 50 60 70 

orf 115ng- 1 . p NEQTFGEKKVFS ENGKLHNYWRARRKGHDETGHREQNYTLPEE I TRD I SLGS FAYESHS K 

III I II II | hi II hi II II II II II I 
orf 115 STGHSEQNYTLPRE I TRN I SLGS FAYESHRK 

10 20 30 



20 80 90 100 110 120 130 

orf 115ng-l.p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
I I :| I ' I I I I i II I M Mill II I II MM M I I I M I M I I I I 

orf 115 ALSHHAPSQGTELPQSN GISLPYTSNS FTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 



25 140 150 160 170 180 190 

orf 115ng- 1 . p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 

I ! 1 1 1 1 1 1 1 1 1 M 1 1 1 M II II II II Ml 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 M 1 1 

or f 1 1 5 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRL INEQ I AELTGHRRLDGYQND 

90 100 110 120 130 140 



30 200 210 220 230 240 250 

orf 115ng- 1 . p EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 

1 1 II 1 1 M 1 1 1 1 1 1 1 1 M II 1 1 1 II 1 1 1 1 1 M 1 1 1 II M 1 1 1 Ml 1 1 1 1 1 1 1 1 M M M 

or f 1 1 5 EEQFKALMDNGATAARSMNLSVGI ALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 



35 260 270 280 290 300 310 

orf 115ng-l.p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

II II MM I M M II 1 1 1 1 1 1 1 1 1 1 II I M I II 1 1 M I II I M I II II II II I II 1 1 

orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 
210 220 230 240 250 260 



40 320 330 340 350 360 370 

orf 115ng- 1 . p SAVTATQD I NN IGG I LS AEQTLLLNAGNN INNQSTAKSSQNAQGS STYLDRMAG I Y I TGK 

i I i I I I I 1 1 1 J 1 I : I I 1 j 1 I I ! I I I I 1 I i : 1 I I : I I I h I I I I I I I I M I I I I I I II 
orf 115 SAVTATQD INNI GGMLS AEQTLLLNAGNN INSQSTTASSQNTQGS S TYLDRMAG I Y I TGK 

270 280 290 300 310 320 



45 380 390 400 410 420 430 

orf 115ng- 1 . p E KG VLAAQ AGKD INI IAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 

MM 

orf 115 EKGV 
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In addition, it shows homology with a secreted N. meningitidis protein (SEQ ID NO: 1143) in the 
database: 

gi | 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
2273 

5 Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct : 739 LI VGTPESALDNDETLGTKTI - TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS -SIR 796 

10 Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDI PGTV VP WAENG I HPTFT LPNSSLFAI 840 

Query : 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 
P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
15 Sbjct: 841 APNNKG YL I ETD P AFTD YRKWLGSGYMLAALQQD PNH I HKRLGDG Y YEQKLVNEQ I AKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 24 0 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct : 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 2 99 
20 DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 

Sbjct : 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGS VVD I G - SGAI ENRGGL I AGREAL I LNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 I KNLQGDLQGKN I FAAAGSD I TNTGS I - GAENALLLKASNN I ESRS ETRSNQNEQGS VRN 1078 

25 , Query: 360 LDRMAG I Y I TGKEKGVLAAQAGKD INI I AGQ I SNQSDQGQTRLQAGRD INLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 

Query: 420 E IHFDADNHT I RGSTNEVGS S I QTKGDVTLLSGNNLNAKAAE VGS AKGTLAVYAKND I T I 479 
FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
30 Sbjct: 1139 NT I FDSDNYV I RKEQNEVGST I RTRGNLS LNAKGD I R I RAAEVGSEQGRLKLAAGRD I KV 1198 

Query: 4 80 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 53 9 

+G + +DA K+TGRSGGG K +T ++ + A S T +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEI ILVSGRDITVTG 1258 

Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 
35 SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 

Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTES WGSLNGNTL I S AGKHYTQTGST I S S PQGDVG I S SGKI S I DAAQNRYSQES K 1378 

40 Query: 659 QTYEQKGLTVAFSS PVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAISVPWN 1396 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 62 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 517): 

1 . . TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA ATAATGACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

3 51 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 
4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 
501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 
551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 
601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG . CTAAC 
651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 518; ORF1 17): 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ . . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein (SEP ID NO: 1143) of N. meningitidis 
(accession number AF030941) 



ORF117 (SEQ ID NO: 518) and pspA protein (SEQ ID NO: 1 143) show 45% aa identity in 224aa 
overlap: 



0rfll7: 4 NLNAKAAEVS S ANGTLAVSANND IN I SAG I NTTHVDDAS KHTGRSGGGNKLVI TDKAQSH 63 

+ + +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T + + 

pspA: 1173 D I R I RAAEVGSEQGRLKLAAGRD I KVEAGKAHTETEDALKYTGRSGGG I KQKMTRHLKNQ 1232 

Orf 117 : 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 

pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 

Orf 117: 124 QKSGLM - SAG I GFT I GS KTNTQENQSQSNEHTGSTVGSLKGDTT I VAGKHYEQ I GSTVS S 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 

pspA: 1293 EKSGLMGSGG I GFTAGS KKDTQTNRS ETVSHTES WGSLNGNTL I S AGKHYTQTGST I SS 1352 
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Orfll7: 183 PEGNNT I YAQS ID I QAAHNKLNSNTTQTYEQKXLTVAFS S PVTD 226 

P+G+ 1+ IIAAN++ + Q YEQK +TVA S PV" + 
pspA: 1353 PQGDVGISSGKISIDAAQNRYSQESKQVYEQKGVTVAISVPWN 1396 

Homology with a predicted ORF from N. gonorrhoeae 

5 ORF117 (SEQ ID NO: 518) shows 90% identity over a 230aa overlap with a predicted ORF 
(ORF1 17ng) (SEQ ID NO: 520) from N. gonorrhoeae: 

orfll7.pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

I I I 1 I I I I I I I I = I I : [ i E t I hllhll 
orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 480 

10 orfll7.pep AGINTTHVDDASKHTGRSGGGNICLVITDKAQSHHETAQSSTFEGKQVVLQAGNDANILGS 90 

:||:: :||ll INI IIMIIIIIIMMMIMIIMIIIIMIMI IMIIMIIMI 

or f 1 1 7ng SGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANI LGS 54 0 

orf 117 .pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

I M 1 1 1 hi 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 i I M 1 1 1 1 1 1 1 1 M M I ! 1 1 1 1 1 Ml 

] 5 orf 117ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSSPEGNNTIYAQSIDIQAAHNKLNSNTTQT 210 

I I I I H I I I I I I I I I I M I I I I Ihllllllll I :|hll Ihhilhllll 
orf 117ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orf 117. pep YEQKXLTVAFS S P VTDLAQQ 230 

20 I I I I I I I I I I I I II I I I I I 

orf 117ng YEQKGLTVAFSSPVTDLAQQAIAVAHKAAKQFDKAKTTALMPWRLPMQVGRLFKQAKAPK 720 

An ORF117ng nucleotide sequence (SEQ ID NO: 519) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 520): 

25 1 . . LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

30 251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALI INTDTL 

3 01 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

35 501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAK QFDKAKTTAL 

701 MPWRLPMQVG RLFKQAKAPK K* 



40 



Further work revealed the following gonococcal partial DNA sequence (SEQ ID NO: 521): 



i 

51 
101 



TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 
CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 
CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 
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151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

2 51 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

3 01 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

3 51 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 
4 51 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 
501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 
551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 
601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 
651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 
701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 
751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 
801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 
851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 
901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 
951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AATCAGATCA AGGGCAAACC 

12 01 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

12 51 ATATCAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATTCAAACAA AAGGCGATGT TACCCtatTG 

1351 TCAGGGAATA ATCTCAATGC CAAAGCTGCC GAAGTCGGCA GCGCAAAAGG 

14 01 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

14 51 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATTGATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 522; ORF1 17ng-l): 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 
51 - LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

2 51 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SG I HAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ A I AVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 
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ORF117ng-l (SEQ ID NO: 522) shows the same 90% identity over a 230aa overlap with ORF1 17 
(SEQ ID NO: 518). In addition, it shows homology with a secreted N. meningitidis protein (SEQ ED 
NO: 1 143) in the database: 

gi | 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length = 
5 2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 
L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
10 Sbjct: 739 LIVGTPESALDNDETLGTKTI -TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS -SIR 796 

Query : 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASD I PGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 
15 P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 

Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query : 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct : 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

20 Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN- SGTI AGRNALI INTDT 299 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGS WD I G - SGA I ENRGGL I AGREAL I LNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 
+ N+ G + + A DI N G I AE LLL A NNI ++S +S+QN QGS 

25 Sbjct: 1020 I KNLQGDLQGKN I FAAAGSD I TNTGS I - GAENALLLKASNNI ESRSETRSNQNEQGS VRN 1078 

Query: 360 LDRMAG I Y I TGKEKGVLAAQAGKD INI I AGQ I SNQSDQGQTRLQAGRD INLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 



30 



Query: 420 E I HFDADNHT I RGSTNEVGS S I QTKGDVTLLSGNNLNAKAAEVGS AKGTLAVYAKND I T I 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 

Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 53 9 

+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 

Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGI KQKMTRHLKNQNGQAVSGTLDGKEI ILVSGRDITVTG 1258 

35 Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 598 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 

Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query : 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 
++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + + + 

40 Sbjct: 1319 ETVSHTESWGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 1378 



Query: 659 QTYEQKGLTVAFSS PVTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 1379 QVYEQKGVTVAI SVPWN 1396 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 63 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 523): 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

.51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

2 01 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 
251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

3 01 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 
351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

4 01 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 
4 51 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 
501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 524; ORF1 19): 



1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY... 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 525): 

1 ATGATTTACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

3 51 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 
451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAC CCTGGTTTGA 
501 CGTGCGCTTC GACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 
551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 
601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 
701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA CGCATTCGCA 
751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 
801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 
851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 
901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 
951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAACGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

12 01 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 
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1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This corresponds to the amino acid sequence (SEQ ED NO: 526; ORF1 19-1): 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

5 51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

10 301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 

15 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 19 (SEQ ID NO: 524) shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) 
(SEQ ID NO: 528) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 119 .pep MIYIVLFLAWl^WAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
20 | M | | | | | | : | | | | M | | M | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I | | | 

orf 119a MIYIVLFLAAVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 119 .pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
25 | | | | | | | | | | | | | | | || | | || | | | | | | | | | | || | | | | || | | | | || | | | | | | | | | I | | | 

orf 119a , MPKPQPAVKKTAKSQDPAMRNLQEQDAVY I AKQKQAKASP FKTEIETALE ESGIIGNSAH 

70 80 90 " 100 110 120 

130 140 150 160 170 

orf 119 .pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

30 II llllllll Mill llhlllllllllllllllllllll llllhlllll 

orf 119a TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 



35 



orf 119a AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

The complete length ORF1 1 9a nucleotide sequence (SEQ ID NO: 527) is: 



1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

40 151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 
3 01 . TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

3 51 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 
45 4 01 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 
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501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

5 701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

10 951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

15 1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 528): 



1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

20 51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

25 301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF119a (SEQ ID NO: 528) and ORF119-1 (SEQ ID NO: 526) show 98.6% identity in 428 aa 
30 overlap: 



10 20 30 40 50 60 

orf 119a. pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

I II I II I I I :| I I I I I M I I I M II I I I I I I I I I I M I I i ! II I I I I I I I I I i I || 
orf 119-1 MIYIVLFLAVVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

35 10 20 30 40 50 60 

70 80 90 100 110. 120 

orf 119a . pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGI IGNSAH 

1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 j 1 1 1 1 1 1 

or f 1 1 9 - 1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTE I ETALEESGI IGNSAH 

40 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 119a. pep TVPEPQTGHS APKPADAPAKPVPVPQTPAKPL I TLKELSKVELPWFDVRFDFISYIALTE 

II I II I I I I I I I I II I I |:| I I I I I I I I I I I ■ I ■ I I I I I I I I I I I I I I I I I I I I ■ I I 
orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

45 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 11 9a. pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

1 1 1 1 1 1 1 1 1 1 1 I i II I II I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M M 1 1 M I 1 1 I II I 1 1 1 1 

orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
50 190 200 210 220 230 240 
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250 260 270 280 290 300 

orf 119a . pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

1 1 1 i M I h 1 1 M I M 1 1 1 1 i II 1 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 M II 1 1 

or f 1 1 9 - 1 AFNRQVDAFAQSMGGQTLHTDLAAF I EVAS ALDAFCARVDQT I AI HLVS PTS I SGVELRS 

5 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 119a . pep AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

1 1 1 M 1 1 1 ! 1 1 1 1 1 1 1 1 1 M 1 1 1 1 II I M 1 1 1 1 1 1 1 1 I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
10 310 320 330 340 350 360 

370 380 390 400 410 420 

orf 119a .pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

I IMIMMIIII IIIIIIIMIIMIIIIIIIIIIMIIIIIIIMI IMIMII 

orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDTOTWLARQSEMLKVGIEPGG 
15 370 380 390 400 410 420 

429 

orf 119a. pep KTALRLFSX 

MINIMI 
orf 119-1 KTALRLFSX 

20 Homology with a predicted ORF from N .gonorrhoeae 

ORF119 (SEQ ID NO: 524) shows 93.1% identity over a 175aa overlap with a predicted ORF 
(ORF1 19ng) (SEQ ID NO: 530) from N. gonorrhoeae: 



orf 119 .pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 60 

M 1 1 M II 1 : 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II 1 1 II I IMMIMMM II 

25 orf 1 1 9ng MI Y I VLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 6 0 

orf 119 .pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 120 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 

orf 119ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 120 

orf 119 .pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 175 

30 M II M II 1 1 1 1 1 1 1 1 M H 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h II I II 

orf 119ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 180 



The 



complete length ORF1 19ng nucleotide sequence (SEQ ID NO: 529) is: 



1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

35 51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

40 301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

4 51 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

45 551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 
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651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

751 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801- AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

5 851 CCATCCATTT GGTTTCGCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGGGAAAAAA CCTTCGACGA 

10 1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

1 5 This encodes a protein having amino acid sequence (SEQ ID NO: 530): 

1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

20 201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAF I EVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

3 01 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

3 51 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLfCVGIEPGG KTALRLFS* 

25 

ORF1 19ng (SEQ ID NO: 530) and ORF1 19-1 (SEQ ID NO: 526) show 98.4% identity over 428 aa 
overlap: 

10 20 30 40 50 60 

orf 119ng MIYIVLFLAAVLAWAYNNYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

30 I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I M I I I II I I I I 

orf 119-1 MIYIVLFLAVVLAVVAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 119ng MPKPQPAVKKP AKPQDS AMRNLQEQDAVY I AKQKQAKAS PFKTE I ETALEE IG 1 1 GNS AH 

35 IIIIIIIIM Mill II llllllll 1 1 1 1 1 1 1 1 MM M 1 1 1 1 1 III llllllll 

orf 119-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 119ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

40 | || || | || | | | | | | | | | | | | | : | | | | | | | | || II II I II I I I I I I I II II I I I I I I I I I I 

orf 119-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 119ng AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

45 || | | | | | | | || | | | | | || II II II II I II I I I I I I I I I I I M I M I I I I M I II II I I I I 

orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 119ng AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
50 | | | | | : | | | | | | | | | | | | | | | | | | | | || | | | | || | | || I I I II M M I I I I M I I I I I I I 

orf 1 1 9-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
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250 



260 



270 



280 



290 



300 



orf 119ng 



310 320 330 340 350 360 

AVTGVGFVLEDDGAFHYTDTSGSTMFS ICSLNNEPFTNALLDNQSYKGFSMLLDI PHSPA 



orf 119-1 




310 320 330 340 350 360 



orf 119ng 



370 380 390 400 410 420 

GEKTFDDLFMDLAWLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 



orf 119-1 




370 380 390 400 410 420 



orf 119ng 



429 

KTALRLFSX 



orf 119-1 



Illllllll 
KTALRLFSX 



Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 64 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 531) 



1 . .GCGCGGCACG GCACGGAAGA TTTCTTCATG AACAACAGCG ACAC . ATCAG 

51 GCAGATAGTC GAAAGCACCA CCGGTACGAT GAAGCTGCTG ATTTCCTCCA 

101 TCGCCCTGAT TTCATTGGTA GTCGGCGGCA TCGGCGTGAT GAACATCATG 

151 CTGGTGTCCG TTACCGAGCG CACCAAAGAA ATCGGCATAC GGATGGCAAT 

201 CGGCGCGCGG CGCGGCAATA TTTyGCAGCA GTTTTTGATT GAGGCGGTGT 

251 TAATCTGCGT CATCGGCGGT TTGGTCGGCG TGGGTTTGTC CGCCGCCGTC 

301 AGCCTCGTGT TCAATCATTT TGTAACCGAC TTCCCGATGG ACATTTCCGC 

351 CATGTCCGTC ATCGGCGCGG TCGCCTGTTC GACCGGAATC GGCATGGCGT 

401 TCGGCTTTAT GCCTGCCAAT AAAGCAGCCA AACTCAATCC GATAGACGCA 

451 TTGGCACAGG ATTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 532; ORF134): 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 533): 



1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 



51 
101 
151 



1 



. ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 
LAQD* 
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• 2 51 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

4 01 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

5 451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

10 701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

. 801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

15 951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

1151 CATTGGCACA GGATTGA 



20 



This corresponds to the amino acid sequence (SEQ ID NO: 534; ORF134-1): 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGNGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

25 151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

3 51 AMSVIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 



30 



Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical protein o648 (SEP ID NO: 1 144) of Exoli (accession number 
AE000189) 

ORF134 (SEQ ID NO: 532) and o648 protein (SEQ ID NO: 1 144) show 45% aa identity in 153aa 
35 overlap: 



Orf 134 : 


2 


RHGTEDFFMNNSDXIRQIVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEI 


61 






RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 




0648: 


496 


RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 


555 


Orf 134 : 


62 


GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAMSVI 


121 






GIRMA+GAR ++ QQFLIEA F+ + + S ++++ 




0648 : 


556 


G I RMAVGARASDVLQQ FL I EAVLVCLVGGALG ITLSLL I AFTLQLFLPGWE I GFS PLALL 


615 


Orf 134 : 


122 


GAVACSTG I G I AFGFMPANKAAKLNP I DALAQD 154 








A CST GI FG++PA AA+L+P+DALA++ 




0648: 


616 


LAFLCSTVTGILFGWLPARNAARLDPVDALARE 64 8 





45 Homology with a predicted ORF from meningitidis (strain A) 
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ORF134 (SEQ ID NO: 532) shows 98.7% identity over a 154aa overlap with an ORF (ORF134a) 
(SEQ ID NO: 536) from strain A of N. meningitidis: 

10 20 30 

orf 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 
5 I 1 I - M I I 1 I I I I I I I i I I I I M Mil I 

or f 1 3 4 a GESHTNS I TVKI KDNANTQVAEKGLTDLLKARHGTEDFFMNNSDS I RQ I VESTTGTMKLL 

210 220 230 240 250 260 

40 50 60 70 80 90 

orf 134 .pep I S S I AL I S L WGG I GVMN I ML VS VTERTKE I G I RMA I GARRGNI XQQFL I EAVL I CV I GG 

10 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 M 1 1 1 1 

orf 134a I S S I AL I S L WGG I GVMNI MLVS VTERT KE I G I RMA I GARRGNI LQQFL I EAVL I CV I GG 

270 280 290 300 310 320 

100 110 120 130 140 150 

orf 134 . pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 

15 1 1 1| M || | M MM II Mil Ml II II II II I II 1 1 II II II II Mill 1 1 II II II I 

orf 134a LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 

orf 134. pep LAQDX 

20 Mill 

orf 134a LAQDX 

The complete length ORF134a nucleotide sequence (SEQ ID NO: 535) is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

25 51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCATTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC AGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

30 301 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

4 01 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

35 551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

701 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 

751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

40 801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

45 1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 536): 

50 1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGNGSQKK ILEDISSIGT 
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51 NTISIFPGRG 

101 YRNTDLTASL 

151 DKLFADSDPL 

201 HQITGESHTN 

251 RQIVESTTGT 

3 01 IGARRGNILQ 

351 AMSVIGAVAC 



FGDRRSGRIK TLTIDDAKII 
YGVGEQYFDV RGLKLETGRL 
GKTILFRKRP LTVIGVMKKD 
SITVKIKDNA NTQVAEKGLT 
MKL LISSIAL ISLWGGIGV 
Q FLIEAVLIC VIGGLVGV GL 
STGIGIAFGF MPANKAAKLN 



AKQSYVASAT 
FDENDVKEDA 
ENAFGNSDVL 
DLLKARHGTE 
MNIMLVSVTE 
SAAVSLVFNH 
PIDALAQD* 



PMTSSGGTLT 
QVWIDQNVK 
MLWSPYTTVM 
DFFMNNSDSI 
RTKEIGIRMA 
FVTDFPMDIS 



10 



ORF134a (SEQ ID NO: 536) and ORF134-1 (SEQ ID NO: 534) show 100.0% identity in 388 aa 
overlap: 



15 



20 



25 



30 



'orf 134a .pep 
orf 134-1 
orf 134a .pep 
orf 134-1 
orf 134a .pep 
orf 134-1 
orf 134a .pep 
orf 134-1 
orf 134a .pep 
orf 134-1 
orf 134a. pep 
orf 134-1 



MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 

I IU 1 1 U M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 I M 1 . 1 1 1 1 1 1 1 1 1 1 

MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 
FGDRRSGR I KTLT I DDAK 1 1 AKQSYVASAT PMTSSGGTLT YRNTDLTASL YGVGEQYFDV 

III lllhlMIMIIIIII llllllllllllllllllllll IIIMIIIIIIIII 

FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

I I ' I I I I I I I II I I I I I I I Ml I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

ENAFGNSDVLMLWS P YTTVMHQ I TGESHTNS I TVKI KDNANTQVAE KGLTDLLKARHGTE 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 m 1 1 1 

ENAFGNSDVLMLWS PYTTVMHQ I TGESHTNS I TVKI KDNANTQVAE KGLTDLLKARHGTE 
DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

iiiiiiiiiii mini iiiiiMiii Milium MiiMiMi illinium 

DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
IGARRGNI LQQFL IEAVL I CVI GGLVGVGLSAAVSLVFNHFVTDFPMD I SAMS VI GAVAC 

IIMI II II Mill Ml II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 IIIIIIIIIII II I II 

IGARRGNI LQQFL I EAVLI CVI GGLVGVGLSAAVSLVFNHFVTDFPMD I SAMS VI GAVAC 



orf 134a . pep STGIGIAFGFMPANKAAKLNPIDALAQDX 

I I I I I I I I I I I I I I I I I I I I I II I I I I I 
orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF134 (SEQ ID NO: 532) shows 96.8% identity over a 154aa overlap with a predicted ORF 
(ORF134.ng) (SEQ ID NO: 538) from N. gonorrhoeae: 



35 



orf 134 .pep 
orf 134ng 



ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 3 0 

IIIIIIIIIM I I Mlllllllll 
GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 264 



90 



40 



orf 134 .pep I S S I AL I S L WGG I GVMN I ML VS VTERT KE I G I RMA I GARRGN I XQQ FL I E AVL I CV I GG 

MIMMIIMMMMIMMIMMIMIMMIMMIMI 1 1 1 1 1 1 1 1 1 1 1 = 1 1 ! 

orf 134ng I S S I AL I S L WGG I GVMN I MLVS VTERTKE I G I RMA I GARRGN I LQQ FL I EAVL I C 1 1 GG 324 
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orf 134 . pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I III II II II II II I II Ill-Ill II II I II MM II 1 1 MINIM IIMIII I Mill 

orf 134ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 

orf 134. pep LAQD 154 

MM 

orfl34ng LAQD 388 

The complete length ORF1 34ng nucleotide sequence (SEQ ID NO: 537) is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACCAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCGCTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTCGAAG ACATCAGTTC GATGGGGACG 

151 AACACCATCA GCATCTTCCC CGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

2 01 CAAAATCAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

2 51 GCTACGTTGC CTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACC 

3 01 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 
351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGATGAGA 

4 01 ACGATGTGAA AGAAGACGCG CAAGTCGTCG TCATCGACCA AAATGTCAAA 
4 51 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 
501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 
551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 
6 01 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 
651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 
701 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 
751 AGGCAGATGG TCGAAAGCAC CACCGGTACG 'ATGAAGCTGC TGATTTCCTC 
801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 
851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 
901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 
951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 538): 



1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGNGSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

3 01 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AASVIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 



ORF134ng (SEQ ID NO: 538) and ORF134-1 (SEQ ID NO: 534) show 97.9% identity in 388 
overlap: 



orf 134ng MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSMGTNTISIFPGRG 

I II 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M h 1 1 1 1 1 1 1 1 M I ! 

orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSWALGNGSQKKILEDISSIGTNTISIFPGRG 
orf I34ng FGDRRSGKIKTLT I DDAK 1 1 AKQSYVASAT PMTSSGGTLT YRNTDLTASL YGVGEQYFDV 

II II M h M 1 1 M II 1 1 II II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II II I II 1 1 M I II 1 1 M I II 

orf 134-1 FGDRRSGR I KTLT I DDAK I I AKQSYVASATPMTSSGGTLT YRNTDLTASL YGVGEQYFDV 
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orf 134ng RGLKLETGRLFDENDVKEDAQVVVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M I M I II I M 1 . 1 1 1 1 1 II I . M 1 1 1 1 1 M 1 1 1 1 M M I 

orf 134 - 1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 
or f 1 3 4 ng ENAFGNSDVLMLWS PYTTVMHQ I TGESHTNS I TVKI KDNANTRVAEKGLAELLKARHGTE 

I II 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 M 1 1 II I Mhl 1 1 1 1 l-l 1 1 1! I M I 

orf 134-1 ENAFGNSDVLMLWS PYTTVMHQ I TGESHTNS I TVKI KDNANTQVAEKGLTDLLKARHGTE 



10 



orf 134ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

1 1 1 1 ' 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1' 1 1 

orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 

orf 134ng I GARRGN I LQQFL I EAVLI C I I GGLVGVGLS AAVSLVFNHFVTDFPMD I S AAS V IGAVAC 

I I I I I I I I I I M I I I I I I h I I I I I I I I I I I I I I I I I I I I II I I M I I I I llllllll 
orf 134 - 1 I GARRGN I LQQFL I EAVLI CV I GGLVGVGLS AAVS LVFNHFVTDFPMD I SAMS VI GAVAC 



orfl34ng STGIG I AFGFMP ANKAAKLNP I DALAQDX 

1 1 1 1 1 ; I I I I I I I I I I I I M I I I I I II I 
15 orfl34-l STGIGIAFGFMPANKAAKLNPIDALAQDX . 

ORF134ng (SEQ ID NO: 538) also shows homology to an E.coli ABC transporter (SEQ ID NO: 
1145): 



20 



25 



30 



35 



40 



sp|P75831 |YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ ) gi5 
(AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 64* 
Score = 297 bits (753), Expect = 6e-80 

Identities = 162/389 (41%), Positives = 230/389 (58%), Gaps = 1/389 (0%) 

Query: 1 MSVQAVLAHKMRSLLTMLXXXXXXXXXXXXXXLGNGSQKKILEDI SSMGTNT I S I FPGRG 60 

M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI + + PG+ 

Sbjct: 260 MAWRALAANKMRTLLTMLG III G I AS WS I VWGDAAKQMVLAD I RS I GTNT IDVYPGKD 319 

Query: 61* FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 120 

FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 

Sbjct: 32 0 FGDDDPQYQQALKYDDLIAIQKQPWVASATPAVSQNLRLRYNNVDVAASANGVSGDYFNV 379 

Query: 121 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFAD-SDPLGKTILFRKRPLTVIGVMKK 179 

G+ G F++ + AQVW+D N + +LF +D +G+ IL P VIGV + + 
Sbjct: 3 80 YGMTFSEGNTFNQEQLNGRAQVWLDSNTRRQLFPHKADWGEVILVGNMPARVIGVAEE 43 9 

Query: 180 DENAFGNSDVLMLWS PYTTVMHQ I TGESHTNS I TVKI KDNANTRVAEKGLAELLKARHGT 239 

++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 
Sbjct: 440 KQSMFGS S KVLRVWLP YSTMSGRVMGQS WLNS I TVRVKEGFDS AEAEQQLTRLLS LRHGK 499 

Query: 240 ED F FMNNS D S I RQMVES TTGTM KXXXXXXXXXXX WGG I GVMN I MLVS VTERT KE I G I RM 299 

+DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EIGIRM 
Sbjct: 500 KDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLVVGGIGVMNIMLVSVTERTREIGIRM 559 

Query: 300 AIGARRGNILQQFLIEXXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAASVIGAVA 359 

A+GAR ++LQQFLIE F+ + + S +++ A 

Sbjct: 560 AVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALLLAFL 619 

Query: 360 CS TG I G I AFG FM P ANKAAKLNP I D ALAQD 388 

CST GI FG++PA AA+L+P+DALA++ 
Sbjct: 620 CSTVTGI LFGWLPARNAARLDPVDALARE 648 
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Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 539): 

1 . . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

3 51 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

4 01 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 
4 51 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 
501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 
551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CgTCAGCGGT 
601 ATTTTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 540; ORF135): 

1 . . GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 
51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 
151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 541): 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

2 01 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

4 51 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 



This corresponds to the amino acid sequence (SEQ ID NO: 542; ORF135-1): 
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1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
51 TVALGAAAVL RRDXFRTPHW KNHLNRSMVG TGAMLLLFYA VTHL PLATGV 
101 TLSYTSSIFL AVFSFLILKE RISVYTQAVL LLGFAGWLL LNPSFRSGQE 
151 TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 
5 201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 
301 * 

Computer analysis of this amino acid sequence gave the following results: 

10 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF135 (SEQ ID NO: 540) shows 99.0% identity over a 197aa overlap with an ORF (ORF135a) 
(SEQ ID NO: 544) from strain A of N. meningitidis: 

10 20 30 

orf 135 .pep GTGAMLLLFYAVTILPLATGVTLSYTSSIF 

15 lllllllllllll 1 1 ' 1 1 M 1 1 1 M I M I 

orf 135a STVALGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSI F 

50 60 70 80 90 100 

40 50 60 70 80 90 

orf 13 5 . pep LAVFS FLILKERI SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMSGW A YLK 

20 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 , 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 I Mil 1 1 1 1 1 

orf 135a LAVFS FLILKERI SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMSGWAYLK 

110 120 130 140 150 160 

100 110 120 130 140 150 

' orf 135 .pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 

25 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 

orf 13.5a VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 

160 170 180 190 200 

orf 135 .pep TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCI I ISAVFX 
30 | | | | | | | | M | | | | | | | | | | | | | | | | | | | | | : | | | | | | | | | | | | | | | 

orf 13 5a TRAYKVGDKFTVASLSYMTWFSALSAAFFLAEELFWQEILGMCI I ILSGILSSIRPTAF 

230 240 250 260 270 280 
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orf 135a KQRLQSLFRQRX 
290 300 



The complete length ORF135a nucleotide sequence (SEQ ID NO: 543) is: 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

40 151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

3 01 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

45 4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 
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551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA GAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

5 751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

' 801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

10 This encodes a protein having amino acid sequence (SEQ ID NO: 544): 



1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
51 TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 
101 TLSYTSSIFL AVFSFLILKE RISVYTQAVL LLGFAGWLL LNPSFRSGQE 
151 TAALAGLAGG AMSGWAYLKV RELSIjAGEPG WRWFYLSVT GVAMSSVWAT 
15 201 LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 

251 FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 
301 * 

ORF135a (SEQ ID NO: 544) and ORF135-1 (SEQ ID NO: 542) show 99.3% identity in 300 aa 
20 overlap: 

orf 13 5a . pep MDTAKKD I LGSGWMLVAAACFT I MNVL I KEASAKFALGSGELVFWRMLFS TVALGAAAVL 

I 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 II ! 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 , 1 1 1 II 1 1 1 ! 

orf 135-1 MDTAKJCD I LGSGWMLVAAACFT I MNVL I KEASAKFALGSGELVFWRMLFS TVALGAAAVL 

orf 13 5a. pep RRDTFRTPHW KNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSS I FLAVFSFL I LKE 
25 | | | : | | | | | | | M | | | M | | | M | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | M | | 

or f 13 5 - 1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLS YTSS I FLAVFSFL I LKE 

orf 13 5a . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 
I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I i II I I I I I I I I I I I I I I I I I I I I I I ! 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

30 orf 135a .pep WRWFYLS VTGVAMSS VWATLTGWHTLS FPS AVYLS C I GVS AL I AQLSMTRAYKVGDKFT 

lllllllllllllllllllllll I IIIIIIIIMIIIIIIIIIMI IIIIIIIIM 
orfl35-l WRWF YLS VTGVAMS S VWATLTGWHTLS FPS AVYLS C I GVS AL I AQLSMTRAYKVGDKFT 

orf 135a. pep VASLSYMTWFSALSAAFFLAEELFWQEILGMCI IILSGILSSIRPTAFKQRLQSLFRQR 
I I I II I I I I I I I I I I I I I h I I M II I II I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I 
35 orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQEILGMCI IILSGILSSIRPTAFKQRLQSLFRQR 

Homology with a predicted ORF from N. gonorrhoeae 

ORF135 (SEQ ID NO: 540) shows 97% identity over a 201 aa overlap with a predicted ORF 
(ORF1 35ng) (SEQ ID NO: 546) from N. gonorrhoeae: 
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orf 135 .pep 
orf 135ng 



GTGAMLLLFYAVTXLPLATGVTLS YTSS I F 

Illllllllllll llhllllllllMII 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLS YTSS I F 



30 
335 



CHIR-0160 (356.001) 



-403- 



PATENT 



orf 135 .pep LAVFS FLI LKERI SVYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMSGWAYLK 90 

I I I I I I I II II I I I M I I I M I II I I ! I I I I I I < I I I I llllllllllllllllll 
orf 135ng LAVFS FLILKERISVYTQAVLLLGFAGVVLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 395 

orf 135 . pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 

I I I I I II I ! II I I I I I h I 1 1 I I I I I I I I I I I ' II I I I I I I I I I I I I 1 I I II . I I I 
orf 135ng VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSM 4 55 

orf 135 .pep TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCI I ISAVF 201 

Ihllllll IIIIIIIIIM lllllllllllll IIIMI Mlll|:| 
orf 135ng TRAYKVGDKFTVASLSYMTWFSALSAAFFLGEELFWQEILGMCI I ISAAF 506 

An ORF135ng nucleotide sequence (SEQ ID NO: 545) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 546): 



1 MPSEKAFRRH LRTASFQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPA AQG 

51 ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 

101 NLGHFTDTHL IAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 

151 FRQCGHINRL APGKDCRNGK RDKVFFHTRH YNQVCLEKTN CSARKIKFRH 

201 QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 

251 NVLI KEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 

301 NRSMVGTGAM LLLFYAVTHL PLTTGVTLSY TSSIFLAVFS FLILKERISV 

3 51 YTQA VLLLGF AGWLLLNPS F RSGQEPAAL AGLAGGAMSG WAYLKVRELS 

4 01 LAGEPGWRW FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 
451 AQLSMTRAYK VGDKFTVAS L SYMTWFSAL SAAFFL GEE L FWQEILGMCI 
501 I ISAAF* 



Further work revealed the following gonococcal sequence (SEQ ID NO: 547): 



1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

4 01 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

4 51 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 

651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttcctaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtCCT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 



This corresponds to the amino acid sequence (SEQ ID NO: 548; ORF135ng-l): 



1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVTLGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 TLSYTSSIFL AVFSFLILKE RISVYTQAVL LLGFAGWLL LNPSFRSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSAT GVAMSSVWAT 

201 LTGWHTLSFP SAVYLSGIGV SALIAQLSMT RAYKVGDKFT VAS LSYMTW 
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251 FSALSAAFFL GEELFW QEIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 
301 * 

ORF135ng-l (SEQ ID NO: 548) and ORF135-1 (SEQ ID NO: 542) show 97.0% identity in 300 aa 
5 overlap: 



orf 135ng-l .pep MDTAKKDILGSGWMLVAAACFTVMNVLIKEASAKFALGSGELVFWRMLFSTVTLGAAAVL 

Mlllll! IIIMIilllMlhllMIMIIIIIIMIIIIM MM lllhlllllll 

orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

orf 135ng-l .pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 

10 I II : I I I I I I I I I I II M M I I I I I I I I I I I I I I M : I I I I I I I II I I I I I I I I I I I I I I 

orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSS I FLAVFSFLI LKE 

orf 13 5ng-l .pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 

MMMMMMMMMMMMMMM M M M M M M M M M M M M M M I 

or f 1 3 5 - 1 RIS VYTQAVLLLGFAGWLLLNPS FRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

15 orf 135ng-l .pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 

1 1 1 1 1 I M 1 1 ,1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 ! 1 1 1 1 1 MINIMI MINIM II 

orf 13 5-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
orf 135ng-l.pep VASLSYMTWFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 

MIMIIIMIMIMIMI IIIIIMI llllllll llllll llllllhlllll 

20 or f 1 3 5 - 1 VASLS YMTWFSALSAAFFLGEELFWQE I LGMC IIILSGILSSI RPTAFKQRLQS LFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from TV. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



25 Example 66 



The following DNA sequence was identified in N. meningitidis (SEQ ID NO: 549): 



CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 
AAATCGTCAA TACAGTTCCG GCACATCGGA 
ATGTTCTTTT TCTTCATACA CCAGCAATAT 
CGATTCCCCA TGCGGCATCG TGTTCGGTGC 
CCGCGCATTG CCTGTATGGT AAAGCCGCCG 
GAACATCCAG TCGCTGATGT CGTCAACCGG 
GTTCGACATT GGTCAGTTCG CCsGGTTCAT 
TAAAGACCGT CAAAATAAAT ATCGTCGATC 
GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 
AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 
TTGAATGTTT TACGGGCGCG TTCGTCGGCA 
CTGTTCTACA TAATAAATGA CGGAATCGCC 
GTGTACGGTA TCTGTTTGCA CCTTACTGCG 
GATTCGGATT TGAAAAGTTC mmrwyATTCG 



1 
51 
101 

30 i5i 

201 
251 
301 
351 

35 401 
451 
501 
551 
601 

40 651 

701 



ATGAAGCGGC GTATAGCCGT 
TTTGGGACAA CTGTTGCCGA 
TGCTCTTCCA GATTTTCGGG 
CTGCCCGGGA TCGCCGAAAT 
GCTCCTCTTC CGTCATCTGC 
TAGGGGATGC CgTTGCACAC 
AACGCAAACG cTTTCGCCTT 
TGTTCAGCAC ACCGTAAATA 
CACATATGTT CGCAAATTTC 
TTTGACCATG GCAAAATCCA 
AAAGcTCGCG CCAAAAATAT 
CGGTTTACCG GTTCGTCTGC 
CATCAT^TCT GCTCCTCAAC 
GCTTTCTgcC kTCGGCATCC 
GAATAG 



This corresponds to the amino acid sequence (SEQ ID NO: 550; ORF136): 
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1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 

51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 

101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 

151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 

5 201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 551): 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

10 101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

15 351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

20 ,601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence (SEQ ID NO: 552; ORF136-1): 

25 1 MMKRR IAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADWN 
101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 
151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYIIN DGI 
201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

30 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF136 (SEQ ID NO: 550) shows 71.7% identity over a 237aa overlap with an ORF (ORF136a) 
(SEQ ID NO: 554) from strain A of N. meningitidis: 

35 10 20 30 40 50 59 

orf 136 .pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

III 1 1 Ml II: I I hi 1 1 III MM II II 1 1 MM III Ml MINIM III 

orf 136a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 

40 60 70 80 90 100 110 119 

orf 136 .pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 

Mill MUM : 1 1 1 1 1 1 1 II h II 1 1 II 1 1 II I II 1 1 II 1 1 II II 1 1 1 1 1 MM 

orf 136a PCG I VFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

70 80 90 100 110 120 

45 120 130 140 150 160 170 179 

orf 136 . pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
MMM II I M M I I M M I I I I I I I I I : M • Ml-:: 
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orf 136a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKX 

130 140 150 160 170 180 

180 190 200 210 220 230 

orf 136 . pep AFVGTVYRFVCLFYIINDGIAHH- - -SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

5 : || : | : - I I I I I I I II I I I I I I I I I I I I I I I II I I Ml 

orf 136a R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence (SEQ ID NO: 553) is: 

10 1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

2 01 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 
15 251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

3 51 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAAAATA AATATCGTCG 

4 01 ATCCACATAT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 
451 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

20 501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 
601 CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 
651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 
701 CGGAATAG 

25 This encodes a protein having amino acid sequence (SEQ ID NO: 554): 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

30 201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a (SEQ ID NO: 554) and ORF136-1 (SEQ ID NO: 552) show 73.1% identity in 238 aa 
overlap: 

10 20 30 40 50 60 

35 orf 136a. pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 

I ! 1 I I I ' I I I : I I h I I I I I I I II I I I I I I I I I I Ulllll IMMIMIII 
orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 136a . pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

MINIMUM : 1 1 1 1 1 i 1 1 1 M M 1 1 1 M I M I M 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 

orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

45 orf 136a . pep HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

h:| = III II II I M I I M I M I Ml Ml = : I : 1 = 

orf 136 - 1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

130 ' 140 150 160 170 180 

190 200 210 220 230 
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orf 136a. pep R SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

1 I : I : ::: I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

orf 136- 1 AFVGTVYRFVCLFYI INDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

5 Homology with a predicted ORF from N. gonorrhoeae 

ORF136 (SEQ ID NO: 550) shows 92.3% identity over a 234aa overlap with a predicted ORF 
(ORF136ng) (SEQ ID NO: 556) from N. gonorrhoeae: 

orf 136 .pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 59 

MINIUM: | 1 1 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 = M 1 1 1 1 1 1 1 M 

10 orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS '60 

orf 136 .pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 

I MMhIIIIM Mllllllll llllil MlhlMIIIIIIIIIII I MM 

orf 13 6ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 12 0 

orf 136 .pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 

15 1 1 M 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 i M 1 1 1 1 I! M I ! 1 1 1 1 M 1 1 1 1 1 1 : 1 1 1 1 1 1 

orf 136ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 180 

orf 13 6 .pep AFVGTVYRFVCLFYI INDG I AHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 

I M I I I I II I I II II I I I II I II • II I M I II I I I I I I I I II II I I I I I M 
orf 13 6ng AFAGTVYRFVCLFYI INDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSE 23 5 
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The complete length ORF136ng nucleotide sequence (SEQ ID NO: 555) is: 



1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

25 151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

30 4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

35 651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This encodes a protein having amino acid sequence (SEQ ED NO: 556): 



1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

40 51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVGDAVA HEHPVADVAN 

101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

201 AHHTAPQRVR YLFAPYRGFL PPASDSDLKS SKYSE* 
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ORF136ng (SEQ ID NO: 556) and ORF136-1 (SEQ ID NO: 552) show 93.6% identity in 235 aa 
overlap: 



orf 136ng MMKRRIAVPVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFPIHRQYLPGIAEIDS 

I I I I I I I I I I I : I I I = I II I I I I I I I II M M I I I I I I I I I I I I I 1 = I I I I I I II I I I 

orf 136-1 MMKRRIAVFVLFPQI IRVLGQLLPKIVNTVPAHRMLFQI FGMFFFFIHQQYLPGI AEIDS 

orf 136ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 

I MM hill III I II MINI II MUM III hill III MM II II I III II 

orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

orf 13 6ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I II II I II : I I I I I I 
orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

orf 136ng AFAGTVYRFVCLFYI INDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX ' 

I M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 M I M 1 1 1 1 1 1 llllllllllllll 

orf 136 - 1 AFVGTVYRFVCLFYI INDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 557): 



1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC.TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

3 51 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

401 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 



This corresponds to the amino acid sequence (SEQ ID NO: 558; ORF137): 



1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGNLF ASGMSPDRLE 
101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 559): 



1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

2 01 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 



CHIR-0160 (356.001) 



-409- 



PATENT 



251 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

3 01 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 
* 3 51 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 
5 4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651* TATTTCCGCC CGTCCGGGCA AAAACATCAG CCAAGGTTTC TTCTCTTATC 

10 701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 



15 



This corresponds to the amino acid sequence (SEQ ID NO: 560; ORF137-1): 



1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 
51 GGGAS KGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
101 LEAE I LGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 
20 151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPGKNISQGF FSYLDQTLNV MSVSALQNEL 
251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE I KRKLAA YRY 
301 * 

25 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF137 (SEQ ID NO: 558) shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) 
(SEQ ID NO: 562) from strain A of N. meningitidis: 



10 20 30 40 50 60 

30 or f 13 7 . pep MENMVTF S KIR P LLA I AAAALLAAXRTAGNNA VRKP VQT AKP AA WGLALGGGAS KGFAH 

I I I I M I I I I M I I I I I I I I I I 1 I I I I I : I I ! I I I I I I I I I M I I I M U I I I I I 
orf 13 7a MENMVTFS KIRPLIAI AAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGAS KGFAH 

10 20 30 40 50 60 



70 80 90 100 110 120 

VG 1 1 KVLKENG I PVKWTGTSAGS I VGNLFASGMS PDRLELEAE I LGKTDLVDLTLSTNG 
I I I I I I I I I I I I I I I I M I I I I I I I ^ I I I I I I M I i I I I I I I I I I I M I hi II h I 
VG 1 1 KVLKENG I PVKWTGTSAGS I VGSLFASGMS PDRLELEAE I LGKTD LVDLTLSTSG 
70 80 90 100 110 120 

130 140 149 

F I KGAKLQN Y I NRKLRGMQ I QQF P I KF AA 
I I II llllllllh I :||llllllll 

F I KGEKLQNY I NRKVGGRR I QQ F P I KFAAVATD FE TGKAVAFNQGNAGQAVRAS AA I PNV 
130 140 150 160 170 180 

45 The complete length ORF137a nucleotide sequence (SEQ ID NO: 561) is: 



1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 
51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGCCC 



35 orf 137. pep 

orf 137a 

40 orf 13 7. pep 

orf 137a 
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101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

5 301 TTGGAAGCCG AAATTTTAGG TAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCGGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

4 51 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

10 551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCANGNNNNG NATNTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGAGCA AAAACATCAG CCAAGGCTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CCGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

15 801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 562): 



20 1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRRI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 

25 251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

ORF137a (SEQ ID NO: 562) and ORF137-1 (SEQ ID NO: 560) show 97.3% identity in 300 aa 
overlap: 



30 



orf 13 7a . pep MENMVTFSKI RPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 

I M I I I I M I I I I I I I ' i i I M I I I I I I I • h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i II 

or f 13 7 - 1 MENMVTFS ICI RPLLAI AAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGAS KGFAH 



35 



orf 137a . pep VG 1 1 KVLKENGI P VKWTGTS AGS I VGS LFASGMS PDRLELEAE I LGKTDLVDLTLS TSG 

- 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 ■ 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II 1 1 ! I! 1 1 1 1 1 1 1 1 1 1 1 1; 1 1 

orf 137-1 VGI I KVLKENGI PVKWTGTS AGS I VGS LFASGMS PDRLELEAE I LGKTDLVDLTLSTSG 



40 



orf 13 7a. pep F I KGEKLQNY INRKVGGRRI QQFP I KFAAVATDFETGKAVAFNQGNAGQAVRAS AAI PNV 

III llllllllllll MIIIIIMIIIMMIIIIIIIII IIIIIIIIIIU III 

or f 1 3 7 - 1 FI KGEKLQNYINRKVGGRQ IQQFP I KFAAVATDFETGKAVAFNQGNAGQAVRAS AAI PNV 

orf 137a. pep FQPVI IGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

I II M I I I I I I I I I I I M II I I I I I I I I I I I I I h' II I I I I I I I I II I I I 

orf 13 7 - 1 FQPVI IGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 



orf 13 7a. pep MSVSALQNELGQADW I KPQVLDLGAVGGFDQKKRA I RLGEEAARAALPE IKRKLAAYRY 

I I I I : I I I II II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
or f 1 3 7 - 1 MS VSALQNELGQADWI KPQVLDLGAVGGFDQKKRA I RLGEEAARAALPE I KRKLAAYRY 



45 Homology with a predicted ORF from N. gonorrhoeae 
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ORF137 (SEQ ID NO: 558) shows 89.9% identity over a 149aa overlap with a predicted ORF 
(ORF137ng) (SEQ ID NO: 564) from N. gonorrhoeae: 

orfl3 7.pep MENMVTFSKIRPLLAIAAAALIAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 60 

IIIMIIIIII IIIMIIIIII lllllhlllllMlllllhlllMIIIIIIII 
orfl37ng MENMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

orf 137 .pep VG 1 1 KVLKENGI PVKWTGTSAGS I VGNLFASGMS PDRLELEAE I LGKTDLVDLTLSTNG 120 

M Ml II I I I I I M I I I I I I I I ! II NMM I I I I I M M I I I i M I I I III I I II Ml 
orf 13 7ng IGIVKVLKENGI PVKWTGTSAGS IVGSLLASGMS PDRLELEAE I LGKTDLVDLTLSTSG 12 0 



orf 137 .pep F I KGAKLQNY I NRKLRGMQ I QQFP I KFAA 149 

1 1 1 1 MINIMI: | MMMMMI 

orf 137ng FI KGEKLQNYINRKVGGRQ IQQFP IKFAAVATDFETGKAVAFNQGNAGQAVRASAAI PNV 180 

The complete length ORF137ng nucleotide sequence (SEQ ID NO: 563) is: 



1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

2 01 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

2 51 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 
301 TTGGAAGCCG AGATTTTAGG TAAAACCGAT TTAGTCGATT TAACCTTGTC 

3 51 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 
4 51 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 
501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 
551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 
601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 
651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 
701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 
751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 
801 CGGCGGATTC GATCAGAAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 
851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 
901 TGA 



This encodes a protein having amino acid sequence (SEQ ID NO: 564): 



1 MENMVTFS KI RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

51 GGGASKGFAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFP I KF AAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE I KRKLAAYRY 

301 * 



ORF137ng (SEQ ID NO: 564) and ORF137-1 (SEQ ID NO: 560) show 96.0% identity in 300 aa 
overlap: 



orf 137ng MENMVTFSKI RSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 

MMMMMI MMMMMMMMMMMMMMMMMMMMMMMM 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAVVGLALGGGASKGFAH 
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orf 13 7ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
:| I I I I I I I I I I I I I I I I I I i I I I I I I M I I I M II I I I I I I I | I I I | | I I I I I II II 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orfl37rig FI KGEKLQNY INRKVGGRQ IQQFP I KFAAVATDFETGKAVAFNQGNAGQAVRASAA I PNV 

I ' I I II I I ! I I I M I I I I I I I I I I I I I I I I I I I I II II I I I I I I M I I I I I I I I I I I 
orf 137-1 FI KGEKLQNY INRKVGGRQ I QQFP I KFAAVATD FETGKAVAFNQGNAGQAVRAS AAI PNV 

orf 137ng FQPVIIGRHKYVDGGLSQPVPVSAARRQGANFVIAVDISARPSKNVGQGFFSYLDQTLNV 
Illllllll I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I h I I - I I. II I II I I I I I 
orf 13 7 - 1 FQPVI IGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orf 13 7ng MSVSVLQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

M ' h I I I I I I I I I I I I II I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
or f 1 3 7 MS VSALQNELGQADWI KPQVLDLGAVGGFDQKKRAI RLGEEAARAALPE I KRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 68 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 565): 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

2 01 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

2 51 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

3 01 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 
351 ACACGAAGGG CTGCTATTC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 566; ORF 138): 



1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 
51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 
101 MFKAVHGWEH VQQALDKHEG LLF 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 567): 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

3 01 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 
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4 51 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

5 651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

10 

This corresponds to the amino acid sequence (SEQ ID NO: 568; ORF138-1): 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTANY 

15 151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

2 01 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

Computer analysis of this amino acid sequence gave the following results: 

20 Homology with a predicted ORF from N .meningitidis (strain A) 

ORF138 (SEQ ID NO: 566) shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) 
(SEQ ID NO: 570) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 13 8 .pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

25 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 

orf 13 8a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 138 .pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

30 1 1 II 1 1 1 1 1 1 II 1 1 1 1 M 1 1 II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 

orf 138a MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 

orf 138. pep LLF 

35 Ml 

orf 138a LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

130 140 150 160 170 180 

The complete length ORF138a nucleotide sequence (SEQ ID NO: 569) is: 

40 1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGTCAGG CAGGCATGAA 

201 TCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

45 251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 
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4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

• 501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

5 601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

701 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

751 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

801 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 

10 851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 570): 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

15 101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQI IKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

20 ORF138a (SEQ ID NO: 570) and ORF138-1 (SEQ ID NO: 568) show 99.7% identity over a 298aa 
overlap: 



orf 13 8a. pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

1 1 1 1 M 1 1 1 1 1 1 M 1 . 1 1 M M M 1 1 1 M 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 M II 1 1 1 1 M 

orf 138-1 MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
25 orf 13 8a. pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

MM : Mill I III IIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIIMI 

orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 13 8a. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

I I I I M I M I I I II I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
30 orf 138-1 LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

orfl38a.pep VKQ 1 1 KALRSGEAT I VLPDHVPS PQEGGEGVWVD F FGKPAYTMTLAAKLAHVKGVKTLFF 

II I M I M M M 1 1 M 1 1 M M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I II M II 1 1 1 

orf 13 8 - 1 VKQI I KALRSGEAT I VLPDHVPS PQEGGEGVWTOFFGKP A YTMTLAAKLAHVKGVKTLFF 

orf 13 8a. pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

35 < 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 

or f 13 8 - 1 CCERLPGGQG FDLH I RP VQGELNGDKAHDAAVFNRNAE YW I RRFPTQYLFMYNRYKMP 

Homology with a predicted ORF from N. gonorrhoeae 

ORF138 (SEQ ID NO: 566) shows 94.3% identity over a 123aa overlap with a predicted ORF 
(ORF138ng) (SEQ ID NO: 572) from N. gonorrhoeae: 



40 



orf 13 8 .pep 
orf 13 8ng 



MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

I I Ml IMIIIIIIIMIIIIMI 1 1 1 1 1 1 II M I II II M 1 1 1 1 M M 1 1 

MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 



60 
60 
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orf 13 8 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 120 

Illllllll MINIMUM I : I I I I I h I I II I I li I I I I I I M MM I I I I M 
orf 138ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 12 0 

orf 138. pep LLF 123 
III 

orf 138ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 180 

The complete length ORF138ng nucleotide sequence (SEQ ED NO: 571) is: 



1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

2 51 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

3 01 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 
351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

4 01 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 
4 51 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 
501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GTCAAACAAA 
551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 
6 01 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 
651 ACCTGCATAc acCATGACAC TGGCGGCAAA ATTGGCACAC GTCAAAGGCG 
701 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 
751 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 
801 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 
851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 572): 



1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQ I I KALRA GEATIILPDH 

2 01 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

2 51 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng (SEQ ID NO: 572) and ORF138-1 (SEQ ID NO: 568) show 94.3% identity over 
overlap: 



orf 13 8 - 1 . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I I II II II I II III I I II II II I II II II I lllllllllllllllllllllllllllll 
orf 138ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

orf 13 8 - 1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

Illllllll MMMMMM MMIMMMMMMMMMI MMMM II 

or f 1 3 8 ng MRQAGLNPDTQTVKAVFAETAKCGLELAP AF FKKPED I ETMF KAVHGWEHVQQALDKGEG 

orf 138-1 .pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 

MIMMMMMMMMMM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 1 

orf 13 8ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 
orf 138-1 .pep VKQI IKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 

Illlllllhllllhllllllllllll MMMMMI MIMI MIMMIMI 

orf 13 8ng VKQI IKALRAGEATI ILPDHVPSPQEGG-GWADFFGKPAYTMTLAAKLAHVKGVKTLFF 
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orf 13 8 - 1 . pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

Mill 1 1 1 1 III MINI, MINIMUM: MIMUMIIIIIII I 

orf 138ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

5 In addition, ORF138ng (SEQ ID NO: 572) is homologous to htrB protein (SEQ ID NO: 1147) 
from Pseudomonas fluorescent 

gnl |PID|e334283 (Y14568) htrB [Pseudomonas fluorescens] Length = 253 
Score = 80.8 bits (196), Expect = 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 

10 Query: 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 

+ + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 
Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI IFYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQI IKALRAGEATI ILPDHVPSPQEGGGVWADFFGKPA 219 
+ + + + rv+ K A + +G+ +IK +R G I D P P E G++ FF A 
15 Sbjct: 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD- - PEPAESAGIFVPFFATQA 208 

Query: 220 YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 2 50 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 

20 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF138-1 (SEQ ID NO: 568) (57kDa) was cloned in the pGex vectors and expressed in E.coli, as 
described above. The products of protein expression and purification were analyzed by SDS- 
25 PAGE. Figure 14A shows the results of affinity purification of the GST-fusion protein. Purified 
GST-fusion protein was used to immunise mice, whose sera were used for ELISA (positive result) 
and FACS analysis (Figure 14B). These experiments confirm that ORF138-1 (SEQ ID NO: 568) is 
a surface-exposed protein, and that it is a useful immunogen. 

Example 69 

30 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 573): 

1 . . GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

35 201 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 
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4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

4 51 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG. , 

This corresponds to the amino acid sequence (SEQ ID NO: 574; ORF139): 

1 . .AWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGW YAAPARRSAW 

51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 

101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 

151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVL . . 

Further work revealed the complete nucleotide sequence (SEQ ED NO: 575): 



1 ATGGATGGAC GGCGTTGGGT GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

2 01 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

2 51 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

3 01 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 
351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 
401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 
501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 
551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 
601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 
651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 
701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 
751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 
801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 
851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 
901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 
951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

1401 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 576; ORF139-1): 



1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLiAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

2 01 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

2 51 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVLSVCCLFP LLAIW KAWS 

3 01 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA AA RRSAWMRG 
3 51 LMFLPFMVSP VCVSAGVLLL YPQWTASLPL LLAMYALLAY PFVAKDVLSA 
401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 
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4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL AAFALGIFLL 
501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 

5 Homology with a predicted ORF from N.menimitidis (strain A) 

ORF139 (SEQ ID NO: 574) shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) 
(SEQ ID NO: 578) from strain A of N. meningitidis: 

10 20 30 

orf 13 9 . pep AWSAGESWRVLMESETWHAVWNTLRFS AAA 

10 II IIIIMIIIIII I Ihll Ml HIM 

pr f 1 3 9a QSVGEYVLLAF AAAVXSVCCLFXLLAIW KAWSAGESWRVLMESETWQAVWNTXRFSAAA 
270 280 290 300 310 320 

40 50 60 70 80 90 

orf 139. pep WAAAVLGVVYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 

15 MM 1 1 II I MM IIIMIII III III IIIMIII MM I MM IIIIMIIIIII 

orf 13 9a VYAAAVLGWYAAAARRSAWMRGLMFLPFMVSPVCVSAGVLLLXPQWTASLPLLLAMYAL 
330 340 350 360 370 380 

100 110 120 130 140 150 

or f 13 9 . pep LAYPFVA KDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
20 | | | | | || | | || | | || | || | | | | | || | | || | | | | | | | | || || | | | | | || I II II I I I I I I 

orf 139a LAYPFVA KDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 

160 170 180 . 189 

orf 13 9 .pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 

25 II 1 1 1 1 1 1 II II II 1 1 1 1 1 1 1 1 I II I M 1 1 II I II 

orf 13 9a GEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNYARA MVLTLLLAAFALGXFLLL DGGEGG 
450 460 470 480 490 500 

The complete length ORF139a nucleotide sequence (SEQ ID NO: 577) is: 

30 1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

35 251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

351 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

4 01 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

40 501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 

701 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

45 751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 
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951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



GTGGAATACT 
TGGGTGTGGT 
CTGATGTTTT 
GCTGCTGCTT 
TGTATGCGCT 
TGNGATGCAC 
AAACGGCTTT 
CGTTGCGGCG 
GCGGCAACCT 
GATTTATGCC 
TGGTGCTGAC 
TTGGACGGCG 



NTGCGCTTCT 
GTATGCGGCG 
TGCCGTTTAT 
NATCCGCAGT 
GCTGGCGTAT 
TGCCGCCGGA 
CAGACGGCAT 
CGGTCTGACT 
TGTTCNTGTC 
TATNTGGGAC 
ATTGCTGTTG 
GCGAAGGCGG 



CGGCGGCGGC 
GCGGCGCGGC 
GGTGTCGCCG 
GGACGGCTTC 
CCGTTTGTGG 
TTACGGCAGG 
GCCGCATCAC 
TTGGCGGCGG 
GCGTCNCGAG 
GCGCGGGTGA 
GCGGCGTTCG 
AAAACGGACG 



GGTGTATGCG 
GGTCGGCGTG 
GTTTGTGTTT 
GTTGCCGCTG 
CAAAAGATGT 
GCGGCGGCGG 
GTTCCCCCTC 
CAACCTGCGT 
TGGCAGACGC 
NGATAATTAC 
CGCTGGGTAT 
GAAACGTTAT 



GCGGCGGTTT 
GATGCGCGGG 
CGGCGGGCGT 
CTGCTGGCGA 
TTTATCAGCC 
GTTTGGGTGC 
TTGAAACCGG 
GGGCGAATTT 
TGACGACTTT 
GCGCGGGCGA 
NTTCCTGCTG 
AA 



This encodes a protein having amino acid sequence (SEQ ID NO: 578): 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLSDAYMLK 



RLAWTVFQAA 
LVAGVGVLAL 
VPAARLQTAX 
LLLGGSRYAT 
RRAVSDKAVS 
AGESWRVLME 
LMFLPFMVSP 
XDALPPDYGR 
AATLFXSRXE 
LDGGEGGKRT 



ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 
FGAD GLXWRG WQDTPYLLLY GNVFFXLPVL VRAAYQGFVQ 
WDIEMPVLRP WLAGGVCLVF LYCFSGFGLA 



TLGAGAWRRF 
VEVEIYQLVM 
PVMPSPPQSV 
SETWQAVWNT 
VCVSAGVLLL 
AAAGLGANGF 
WQTLTTLIYA 
ETL* 



FELDMAVA SV LVWLVXGVTA AAGLL YAWFG 
GEYVLLAFA A AVXSVCCLFX LLAIW KAWS 
XRFS AAAVYA AAVLGWYAA AA RRSAWMRG 
XPQWTASLPL LLAMYALLAY PFVAKDVLSA 
QTACRITFPL LKPALRRGLT LAAATCVGEF 
YXGRAGXDNY ARAMVLTLLL AAFALGXFLL 



ORF139a (SEQ ID NO: 578) and ORF139-1 (SEQ ID NO: 576) show 96.5% homology over 
5 14aa overlap: 

orf 13 9a . pep MDGRRWAVWGAFALL P S AFLAAMWAPLWAVAA YDGLAWRAVLSDAYML KRLAWT VFQAA 

III IMIIIMIIIMIIIM IMIII MUM IMMIMIMM Illlllll 

or f 1 3 9 - 1 MDGRRVmWGAFALLPSAFIAVMWAPLWAVAAYDGI^WRAVLSDAYMLKRLAWTVFQAA 
orf 13 9a . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLAL FGADGLXWRG 

MMMMIMIMMMIMM IMIIIMIIIIIMIMII IMIMIMM III 

or f 1 3 9 - 1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 
orf 13 9a. pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWD I EMPVLRP 

MMIIIIIIII lllllllllllllllllllllll M I M 1 1 1 1 M M II M M I 

or f 1 3 9 - 1 RQDTP YLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWD I EMPVLRP 

orf 13 9a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 

MMMMMIMM MIMI MUM MUM MMMMMIMIMI INI 

orf 13 9-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 13 9a. pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 

I I I I I I II I I I II j | I I I II I I II I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIWKAWS 

orf 13 9a. pep AGE SWRVLME SETWQ A VWNTXRFSAAAVYAAAVLGVVYAAAARRSAWMRG LMFLPFMVSP 

Illlllllllllllllllll Mill IMIIIIIIIIIM Illlllllllllllllllll 

or f 1 3 9 - 1 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGVVYAAAARRS AWMRGLMFLPFMVS P 



orf 13 9a. pep VCVSAGVLLLXPQWTASLPLLLAMYALLAYPFVAKDVLSAXDALPPDYGRAAAGLGANGF 

llllllllll IIIIIIIIIIIIIIIIIIIMIIIIIIII MMMMMIMIMI 

orf 13 9-1 VCVSAGVLLL YPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
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orf 13 9a . pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 

I II I I I I I I I I I I I I .1 I I I I I I I ' II I I I I I I II IIIMIIIIIII I I I I I I I 
orf 13 9- 1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 139a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 

5 Illlllllllllllll llllllllllhllMI 

or f 1 3 9 - 1 ARAMVLTLLLAAFALG I FLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF139 (SEQ ID NO: 574) shows 95.2% identity over a 189aa overlap with a predicted ORF 
(ORF1 39ng) (SEQ ID NO: 580) from N. gonorrhoeae: 

10 orf 139. pep AWS AGES WRVLMESETWHAVWNTLRFS AAA 30 

lllllll M 1 1 1 ! 1 1 M 1 1 1 1 1 1 1 1 1 1 1 

orf 13 9ng QSVGEYVLLAFSVAVLSVCCLFPLSAI WKAWS AGES RRVLMESETWQAVWNTLRFS AAA 32 7 

orf 13 9 . pep VYAAAVLGVVYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

hill II III III III Ml II hi II II MM Mill III II IIIIIIMIIIIII 

15 or f 1 3 9ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 3 87 

orf 13 9 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

I I I I I I I I I I I II I II II II I II I I I II I II I I II I I I II I I I II I I I II II I M II I I I 
orf 1 3 9ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 44 7 

orf 13 9 .pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 18 9 

20 I II I MM MM II II lllllll Illlllllllllllll 

orf 13 9ng GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVLTLLLSAFAVCIFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence (SEQ ID NO: 579) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 580): 

25 1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYNLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMPV LRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

30 251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAI WKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGWYAA AARRLVWMRG 

351 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

35 501 LDNGEGGKRT ETL* 

Further work revealed a variant gonococcal DNA sequence (SEQ ID NO: 581): 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

40 101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 
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351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

4 01 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 
501 . GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 
1051 - CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

12 01 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 
1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 
1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

13 51 GCGGCAACCT TGTTCCTGTC GCGTCCGGAA TGGCAGACGT TGACGACTTT 

14 01 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 
14 51 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 
1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 582; ORF139ng-l): 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLA RL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

3 51 LVFLPFMVSP VCVSAGVLLL YPGWTASLPL LLAMY ALLAY PFVAKDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 
4 51 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARA MVLTLLL SAFAVCIFLL 
501 LDNGEGGKRT ETL* 

ORF139ng-l (SEQ ID NO: 582) and ORF139-1 (SEQ ID NO: 576) show 95.9% identity over 
513aa overlap: 

orf 13 9ng MDGRCWAVRGAFSLLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

I II I hi I h i 1 1 1 M 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 MM 1 1 1 1 1 1 1 M I i II 1 1 1 1 1 , 1 

or f 13 9 - 1 MDGRRWVVWGAFALLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
or f 13 9ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

1 1 1 1 1 M I M MM 1 1 1 1 1 1 1 1 M M M M i 1 1 1 1 1 1 1 1 1 1 1 M I It 1 1 I 

orf 13 9 - 1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 
orf 13 9ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

M 1 1 M M I M M M II M 1 1 M M M M 1 1 M Ml I M M I M M M M 1 1 1 M M II 

orf 13 9 - 1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orf 139ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 

M M I I I I I I I I I I I i I I II I I I I II I I I I I I I I I I I II I I I I I I ^ I I I I I I I I 
or f 13 9 - 1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEI YQLVMFELDMAVASVLVWLVLGVTA 
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orf 139ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWS 

I ! I I I j I I I I t I 1 I I I I I I I I ! I I I I I : : I L I I t I I I E I I llllllll 

orf 13 9 - 1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAI WKAWS 

orf 139ng AGESRRVLMESETWQAVWNTLRFSAAAVFAAAVLGVVYAAAARRLVWMRGLVFLPFMVSP 

5 || | M IMIIIIIIIII I llllllll hi II II II II Mill I : I I I I I : M I Ml I I 

orf 139 AGESWRVLMESETWQAVWNTLRFSAAAVYAAAVLGVWAAAARRSAWMRGLMFLPFMVSP 

orf 139ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

MINIM II M I II M I M M II M 1 1 1 II M I M I II 1 1 M I M 1 1 1 1 M I MM 

orf 13 9- 1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
1 0 orf 13 9ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

II MM II I MM 1 1 1 1 ! 1 1 1 II III I II Mill II II II MM I II MM II MM 1 1 

orf 13 9 - 1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLI YAYLGRAGEDNY 

orf 13 9ng ARAMVLTLLLSAFAVCI FLLLDNGEGGKRTETL . 

Mlllllllhllh Mill MIIIMIIII 

1 5 or f 13 9 - 1 ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
20 diagnostics, or for raising antibodies. 

Example 70 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 583): 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 

25 101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 

2 01 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 AACGTTTGGT C... 

30 This corresponds to the amino acid sequence (SEQ ID NO: 584; ORF140): 

1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD I LVKNFGGTL GGVALLVGLG AMLERLV . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 585): 

35 1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

40 251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 
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4 01 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

4 51 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

5 601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

]0 851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

15 1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGTCT CTTGGACATG GACGTACCGA 

13 01 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

20 1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 586; ORF140-1): 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 

251 I FLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

3 51 DLGIPVLLGC FLVALALRIA QGSATVALTT AAALMAPAVA AAGFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



35 Homology with a predicted ORF from N.meningitidis (strain A) 



ORF140 (SEQ ID NO: 584) shows 95.4% identity over a 87aa overlap with an ORF (ORF 140a) 
(SEQ ID NO: 588) from strain A of N. meningitidis: 



10 20 30 40 50 60 

MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 

1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 [ 1 1 1 1 [ 1 1 1 1 [ 1 1 1 1 1 1! 1 1 1 1 M I : I 

MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 
10 20 30 40 50 60 

70 • 80 
I LVKNFGGTL GGVALLVGLGAMLERLV 
: M I I I I I 1 - I I I 1 I 1 I I I I I 1 Ml 

VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADAL I RMFGEKRAP FALGVAS L I F 
70 80 90 100 110 120 

The complete length ORF140a nucleotide sequence (SEQ ID NO: 587) is: 



25 
30 



orf 140 .pep 

40 

orf 140a 



orf 140 .pep 

45 

orf 140a 
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1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

401 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

451 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 

701 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

751 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 

851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 



This encodes a protein having amino acid sequence (SEQ ID NO: 588): 



1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQD VLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPAK AGTV VAIMLIPMLL 

2 51 I FLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 
301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

3 51 DLGIPVLLGC FLVALALRIA QGSATVALTT AAALMAPAVA AAGFTDWQLA 

4 01. CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 
451 FALSALLFAI V * 

ORF140a (SEQ ID NO: 588) and ORF140-1 (SEQ ID NO: 586) show 99.8% identity over a 461 
overlap: 



orf 140-1 .pep MDGWTQTLSAQTLLGISAAAI I LI LI LI VKFRIHALLTLVIVSLLTALATGLPTGS I VND 60 

1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1! I 

orf 14 0a MDGWTQTLSAQTLLGISAAAI I LI LI LI VKFRIHALLTLVIVSLLTALATGLPTGS I VND 60 

orf 140-1 .pep I LVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLI F 120 

: | I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I- 

or f 14 0a VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLI F 120 

orf 140-1 .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

I I I I I M I I I I I I I I I I I I I I I I I I I I II I I I I I I il I II I I I I I I I I I I I I I I M I 

orf 14 0a GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 
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orf 140-1 .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

1 1 1 1 1 II M 1 1 M 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 1 M I IN i M 1 1 1 M M 1 1 1 1 1 M M 1 1 1 

orf 140a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 24 0 

orf 140-1 .pep VAIML I PMLL I FLNTGVSAL I SEKLVSADETWVQTAKI IGSTP I ALL I SVLVALFVLGRK 300 

M I I I I I I M I I I I I I I I I I I I I I II II II I I I I I I I I I I I I II I > I I I I I I I I I I M ! 
orf 14 0a VAIML I PMLL I FLNTGVSAL I SEKLVSADETWQTAKI IGSTP I ALL I SVLVALFVLGRK 300 



orf 140-1 .pep 

orf 140a 
10 orf 140-1. pep 

orf 140a 

orf 140-1 .pep 
15 orfl40a 

Homology with a predicted ORF from N. gonorrhoeae 

ORF140 (SEQ ID NO: 584) shows 92% identity over a 87aa overlap with a predicted ORF 
(ORF140ng) (SEQ ID NO: 590) from N. gonorrhoeae: 



RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGI PVLLGC 360 

I I I I I I I I I I I I I M I I M I I I I I I II I I I I I I I I I I I I I I M : 1 1 1 1 1 1 1 1 1 1 1 1 

RGESGSALEKTVDGALAPVCSVI LI TGAGGMFGGVLRASGIGKALADSMADLGI PVLLGC 360 

FLVALALRI AQGSATVALTTAAALMAPAVAAAGFTDWQLACI VLATAAGSVGCSHFNDSG' 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I M I I I I I I I I II I I I 
FLVALALRI AQGSATVALTTAAALMAPAVAAAGFTDWQLACI VLATAAGSVGCSHFNDSG 42 0 

FWLVGRLLDMDVPTTLKTWTVNQTL I AL I GFALS ALLFAI V 461 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
FWLVGRLLDMDVPTTLKTWTVNQTL IAL I GFALSALLFA IV 461 



orf 14 0 .pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

20 Ml | | | | | | | | || | | | | | | | | | | | | | | | : | | | = | I I | I I I = I I I II I I I I I I I II I I = I 

orf 14 0ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 



or f 14 0 . pep ILVKNFGGTLGGVALLVGLGAMLERLV 8 7 

: I I I I I I I I I I I I I I I I I M ! I III 
orf 14 0ng VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

25 

The complete length ORF140ng nucleotide sequence (SEQ ID NO: 589) was predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 590): 



1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

30 101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVML I PMLL 

251 I FLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

35 351 DLG I PVLLGC FLVALALRI A QGSATVALTT AAALMAPAVA AAGFTDWQLA 

401 CI VLATAAGS VGCSHFNDSG FWLVGRLSDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V* 

Further work revealed a variant gonococcal DNA sequence (SEQ ID NO: 591): 

40 1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 
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201 


CGGCACGCTC 


GGCGGCGTGG 


251 


GACGTTTGGT 


AGAAACATCC 


301 


ATCCGGATGT 


TCGGCGAAAA 


351 


GCTGATTTTC 


GGCTTCCCGA 


401 


TGCCCATCGT 


ATTCGCCACC 


451 


TTCGCGCTTG 


CCTCCGTCGG 


501 


GCCCCATCCG 


GGCCCGATTG 


551 


GCCAGGTTTT 


GATTTTGGGT 


601 


AGCGGCTATA 


TGCTCGGCAA 


651 


TCCCGAACTG 


CTCAGCGGCG 


701 


CTGCCAAAGC 


AGGAACGGTC 


751 


. ATTTTCCTGA 


ATACCGGCGT 


801 


TGCGGACGAA 


ACTTGGGTTC 


851 


TCGCCCTTCT 


GATTTCCGTA 


901 


CGCGGCGAAA 


GCGGCAGCAC 


951 


CCCCGCCTGT 


TCCGTGATTC 


1001 


GCGTTTTGCG 


CGCTTCCGGC 


1051 


GATTTGGGCA 


TTCCCGTCCT 


1101 


GCGTATCGCG 


CAAGGTTCGG 


1151 


TGATGGCTCC 


TGCCGTTGCC 


1201 


TGTATCGTAT 


TGGCAACGGC 


1251 


CGACTCCGGC 


TTCTGGCTGG 


1301 


CCACGCTGAA 


AACCTGGACG 


1351 


TTTGCCTTGT 


CCGCACTGCT 



CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 
GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 
ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 
TTTTCTTCGA TGCCGGACTA ATCGTCATGC 
GCACGGCGCA TGAAACAGGA CGTACTGCCC 
CGCATTTTCC GTCATGCACG TCTTCCTGCC 
CCGCTTCCGA ATTTTACGGC GCGAACATCG 
CTGCCGACCG CCTTCATCAC ATGGTATTTC 
AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 
GCACGCAAGA CAGCGACCCG CCGAAAGAAC 
GTCGCCGTCA TGCTGATTCC CATGCTGCTG 
ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 
AGACGGCAAA AATGATCGGT TCGACACCTG 
TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 
GTTGGAAAAA ACCGTGGACG GCGCACTCGC 
TGATTACCGG CGCGGGCGGT ATGTTCGGCG 
ATCGGCAAGG CACTCGCCGA CAGCATGGCG 
TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 
CAACCGTCGC CCTGACCACA GCCGCCGCGC 
GCCGCCGGCT TTACCGACTG GCAGCTCGCC 
GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 
TCGGCCGCCT CTTGGATATG GACGTACCGA 
GTCAACCAAA CCCTCATCGC ATTCATCGGC 
GTTTGCCATC GTCTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 592; ORF140ng-l): 



1 MDGRTQTLSA QTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

251 I FLN TGVS AL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLGIPVLLGC FLVALALRIA QGSATVALTT AAALMAPAVA AAGFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

ORF140ng-l (SEQ ID NO: 592) and ORF140-1 (SEQ ID NO: 586) show 96.3% identity over 
461aa overlap: 



orf 14 0ng-l .pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 

Ml 1 1 1 1 1 1 1 M 1 1 1 1 M i 1 1 1 1 1 II 1 1 1 h 1 1 1 : 1 1 h 1 1 M 1 1 i M I Mi 1 1 1 

orf 14 0-1 MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 

orf 140ng- 1 . pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 

:| I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mlllll 
orf 14 0-1 I LVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADAL I RMFGEKRAPFALGVASL I F 

orf 14 0ng-l .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 

MMMIMMMIMM IMIMM MMMMMMIMM MIMMI MM 

orf 140-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 
orf 14 0ng-l .pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 

1 1 1 1 ! M I 1 1 1 1 M 1 . 1 1 ^ II 1 1 1 MIMMI : I MIMMI 

orf 140-1 AN I GQVL I LGLPTAF I TWYFSGYMLGKVLGRT I HVPVPELLSGGTQDNDLPKE PAKAGTV 
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orf 14 0ng-l.pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKMIGSTPVALLISVLAALLVLGRK 

I i : 1 I 1 I I I I I I I i I 1 I I I I I k I I I : I I 1 I I ^ 1 I : I I : I L I I t 

orf 14 0-1 VAIMLI PMLLI FLNTGVSALISEKLVSADETWVQTAKI IGSTPIALLISVLVALFVLGRK 

~ orf 140ng-l .pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 

5 I I I II I : I I I I M I I I I I : I I I II I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M 

or f 1 4 0 - 1 RGE S GS ALEKTVDGALAPVCS V I L I TGAGGM FGGVLRASG I GKALADSMADLG I P VLLGC 

or f 14 Ong- 1 . pep FLVALALRI AQGSATVALTTAAALMAPAVAAAGFTDWQLACI VLATAAGSVGCSHFNDSG 

I M 1 1 II 1 1 ! 1 1 1 1 1 1 1 1 1 II I ! 1 1 1 1 Ml 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 1 1 1 1 1 1 1 

orf 140-1 FLVALALRI AQGSATVALTTAAALMAPAVAAAGFTDWQLACI VLATAAGSVGCSHFNDSG 

10 orf 140ng-l .pep FWLVGRLLDMDVPTTLKTWTVNQTL I AF I GFALS ALLFA I V 

I I I II I I I I : I I I I I I I I I I I I I I M M I I I I I I I : I I I I 
orf 14 0 - 1 FWLVGRLLDMDVPTTLKTWTVNQTL I AL I GFALS ALL FA IV 

Furthermore, ORF140ng-l (SEQ ID NO: 592) is homologous to an E.coli protein (SEQ ID NO: 
15 1148): 



gi | 882633 (U29579) ORF_o454 [Escherichia coli] )gi| 1789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW: P46832 [Escherichia coli] Length = 454 
Score = 210 bits (529), Expect = le-53 
20 Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 



Query: 88 ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 

E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 
Sbjct: 80 EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFI ILAPI IYGFAKVAKIS 139 



25 



30 



Query: 148 VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYMLGK 207 

L F L ■ G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 

Sbjct: 140 PLKFGLPVAGIMLTVHVAVPPHPGPVAAAGLLHADIGWLTI IGIAIS- IPVGWGYFAAK 198 

Query: 208 VLGRAIHVPVPELL SGGTQDSDPPKEPAKAGTWAVMLIPMLLIFLNTGV 257 

+ + + + E+L G T+ SD P A V ++++IP+ +1 T 

Sbjct: 199 IINKRQYAMSVEVLEQMQLAPASEEGATKLSDKINPPGVA-LVTSLIVIPIAIIMAGT-- 255 

Query: 258 SALISEKLVSADETWVQTAKMIGSTPXXXXXXXXXXXXXXGRKRGESGSTLEKTVDGALA 317 

+S L+ + T + + IGS +RG S + AL 

Sbjct: 256 VSATLMPPSHPLLGTLQLIGSPMVALMIALVLAFWLLALRRGWSLQHTSDIMGSALP 312 



35 



Query: 318 PACSVI LI TGAGGM FGGVLRASG I GKALADSMADLG I PVLLGCFLVALALRIAQGSXXXX 377 

A VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 
Sbjct: 313 TAAWILVTGAGGVFGKVLVESGVGKALANMLQMIDLPLLPAAFIISLALRASQGS--AT 370 



Query: 378 XXXXXXXXXXXXXXXGFTDWQLACIVLATAAGSVGCSHFNDSGFWLVGRLLDMDVPTTLK 437 

G Q + LA G +G SH NDSGFW+V + L + V LK 
Sbjct: 371 VAILTTGGLLSEAVMGLNPIQCVLVTLAACFGGLGASHINDSGFWIVTKYLGLSVADGLK 430 



Query: 438 TWTVNQTLIAFIGFALSALLFAIV 461 
40 TWTV T++ F GF ++ ++A++ 

Sbjct: .4 31 TWTVLTT I LGFTGFL I TWCVWAVI 4 54 



Based on this analysis, including the identification of the presence of a putative leader sequence 
(double-underlined) and several putative transmembrane domains (single-underlined) in the 
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gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 593): 

1 . . GATTTCGGCA TATCGCCCGT GTATCTTTGG GTTGCCGCCG CGTTCAAACA 

51 TTTGCTGTCG CCGTGGGCTG CCGACTCATA CGATGTCGCA CGCTTTGCAG 

101 GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 

151 AACTTTTTGG GCAGACACCA CGGGCGCAC . GTCGTCCTGA TTCTCATCGG 

201 CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 

251 CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 

3 01 CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 

3 51 GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 

4 01 TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence (SEQ ID NO: 594; ORF141): 



1 ..DFGISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AYPAAFALML PLPVLMFFRP . . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 595): 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

• 51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

3 01 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 
351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

4 01 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 
4 51 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 
501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 
551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 
601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 
651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 
701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 
751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 
801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 
851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 
901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 
951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

12 01 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

12 51 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 
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1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

5 1651 GAAAATATAT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 596; ORF141-1): 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFSHDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

10 101 FAGVFFAVIG LTSCGFAGFN FLGRHHGRSV VLILIGCIGL I PVAHFLNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFFRPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 YHVFGTFGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

3 01 W GILGWWML AVLVLLAVM P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 
15 351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAA Y F SPYYVPDIDP 

4 01 IPMAVAVLFT PLWLWAITRK NIRGRQAVTN WAAGVTLTWA LLMTLFLPWL 
451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENI* 

20 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF141 (SEQ ID NO: 594) shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) 
(SEQ ID NO: 598) from strain A of N. meningitidis: 

25 10 20 30 

orf 141 .pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 

I I I I III IIIIIIIIIIMIII Ihl 

orf 141a WNPDEPAVYTAVEALAGSPTPLVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAA 
40 50 60 70 80 90 

30 40 50 60 70 80 90 

or f 14 1 . pep R FAGVFFAVIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 

II II II I Ihl I II II I II I I III II Ml I II I II M I II lh = I II III II III II II 
orf 141a R FAGVF FAWGLTS CGFA G FNFLGRHHGRS WL ILIGCIGLI PTVHF LNP AAAAFAAAGL 

100 110 120 130 140 150 

35 100 110 120 130 140 

orf 141. pep VLHGYSLARRR VIAASFLLGTGWTLMSLA AA YPAAFALMLPLPVLMFJ RP 

I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I II I I I I 
orf 141a VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 

160 170 180 190 200 210 

40 orf 141a VASIjAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWF 

220 230 240 250 260 270 

The complete length ORF141 a nucleotide sequence (SEQ ID NO: 597) is: 

1 ATGCTGACCT ataccccgcc cgatgcccgc ccgcccgcca aaacccacga 
45 51 aaagccgtgg ctgttgctgt tgatggcgtt tgcctggttg tggcccggcg 

101 tgttttccca cgatttgtgg aatcctgacg aacctgccgt ctataccgcc 
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151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

5 351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

4 01 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

10 601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

15 851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

20 1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

13 01 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 
25 1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 
1451 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 
1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 
1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

30 1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 598): 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFSHDLW NPDEPAVYTA 

35 51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAWG LTSCGFAGFN FLGRHHGRSV VLILIGCIGL I PTVHFLNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLD 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

40 301 W GILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

4 01 I PMAVAVLFT PLWLWAITRK NIRGRQAVTN WAAGVTLTWA LLMTLFLPWL 

451 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIDIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKTG 

45 551 ENILKTTD* 

ORFHla (SEQ ID NO: 598) and ORF141-1 (SEQ ID NO: 596) show 98.2% identity in 553 aa 
overlap: 

or f 141a . pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

50 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 141-1 MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTA VEALAGSPTP 
orf 141a . pep LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVVGLTSCGFAGFN 

Illlllll 1 1 1 MM I Ml III I II MM I Ml llllllllllllhllMIIIIIII 

or f 14 1 - 1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 
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orf 14 la . pep FLGRHHGRSWLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

iii e 1 1 1 [ 1 1 1 1 1 1 1 1 ^ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 141-1 FLGRHHGRSWLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 
orf 14 la . pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

5 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 14 1 - 1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orf 141a . pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl41-l QPALFAQWLDYHVFGTFGGWHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

10 orf 141a. pep WG I LGVVWMLAVLVLLAVNPQRFQDNLVWLLP PLALFGAAQLDSLRRGAAAFVNWFG IMA 

. II MM IIII1MIIII IIII1IIIIMIIIMIII MMIIIIMI lllllllllll II 

or f 14 1 - 1 WGILGVVWMI^VLVLLAWPQRFQDNLWLL^ 

orf 141a . pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 II I I II 1 1 1 1 1 M I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 INI 1 1 

15 or f 14 1 - 1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPI PMAVAVLFTPLWLWAITRK 

orf 141a . pep N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAP WRSMEASLS PELKRELSDG I E 

I ! I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
or f 1 4 1 - 1 NI RGRQAVTNWAAGVTLTWALLMTLFL PWLDAAKS HAP WRSMEAS LS PELKRELSDG I E 

orf 141a . pep C I D I GGGDLHTR I WTQYGTLPHRVGDVQCRYR I VRLPQNADAPQGWQTVWQGARPRNKD 

20 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I 

orf 141-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orf 14 la. pep S KFAL I RKTGEN I 

llllllll MM 

orf 141-1 S KFAL I RK I GEN I 

25 Homology with a predicted ORF from N. gonorrhoeae 

ORF141 (SEQ ID NO: 594) shows 95% identity over a 140aa overlap with a predicted ORF 
(ORF141ng) (SEQ ID NO: 600) from N. gonorrhoeae: 

orf 141 .pep DFGISPVYLWVAAAFKHLLSPWAADSYDVA 3 0 

MM 'llllllllllll III Ihl 

30 orf 141ng WNPAEPAVYTAVEALAGSPTPLVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAAHPYDAA 12 6 

orf 141 .pep RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXWL I L IGC IGLI PVAHFLNPAAAAFAAAGL 90 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 I i 1 1 1 MM Ml 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 

orf 141ng RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 186 

orf 141 .pep VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 14 0 

35 1 1 1 1 1 1 II M I i 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 II 1 1 1 1 ■ 1 1 1 1 1 1 II IN 

orf 14 lng VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 24 6 

An ORF141ng nucleotide sequence (SEQ ID NO: 599) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 600): 
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1 MPSEAVSARP LCEYLLHLAI RPFLLTLMLT YTPPDARPPA KTHEKP WLLL 
51 LMAFAWLWPG VFSHDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 
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101 PPVYLWVAAA FKHLLSPWAA HPYDAAR FAG VFFAVIGLTS CGFA GFNFLG 

151 RHHGRS WLI HIGCIGLIPV AHFFNPAAAA FAAAGLVLHG YSLARRRVIA 

201 ASFLLGTGWT LMSLAAAYPA AFALMLPLPV LMFFRPWQSR RLMLTAVASL 

251 AFALPLMTVY PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 

301 KNLLWFAPPG LPLAVWTVCR TRLFSTDW GI LGIVWMLAVL VLLAFN PQRF 

351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 

4 01 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 

4 51 GRQAVTN WAA GVTLTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 

501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 

551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 601): 



1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 

251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 

4 51 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 

551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

12 01 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

12 51 TAGCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

13 01 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

13 51 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 
14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 
1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 
1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 602; ORF141ng-l): 



1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFSHDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFAGFN FLGRHHGRSV VLIHIGCIGL IPVAHFLNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TVY PLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 W GILGIVWML AVLVLLAFN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

3 51 AFVNWFG IMA FGLFAVFLWT GFFAM NYGWP AKLAERAAYF SPYYVPDIDP 

4 01 IPMAVAVLFT PLWLWAITRK NIRGRQAVTN WAAGVTLTWA LLMTLFLPWL 
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451 DAAKSHAPW RSMEAS FSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 
501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 
551 ENILKTTD* 

ORF141ng-l (SEQ ID NO: 602) and ORF141-1 (SEQ ID NO: 596) show 97.5% identity in 553 aa 
overlap: 



10 



15 



20 



25 



30 



orf I41ng-1 .pep 



orfl41-l 



MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 

MINIUM lllllllllll IIMIII IMIMMIMM MINIM MIMII II 

MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 



orf 141ng-l .pep LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 
I i I I I I I I M ' I I I I I I I I I I I I I I I II I II I I I IIIIIIIIMIIMIII llllil 
LVAHLFGQTDFG I P P VYLWVAAAFKHLLS PWAADS YDAARFAGVFFAVI GLTS CGFAGFN 



orf 141-1 



orf 141ng-l .pep 



orf 141-1 



FLGRHHGRS WL I H I GC I GL I PVAH FLNPAAAAFAAAGLVLHG YS LARRRV I AAS FLLGT 

MiiiiiiiiMi 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 ; 1 1 1 1 1 

FLGRHHGRS WL I L IGC I GL I PVAHFLNP AAAAFAAAGLVLHGYS LARRRV I AAS FLLGT 



orf 141ng-l .pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

I ' 1 1 II II I 1 1 1 1 1 1 1 1 MM II II IN IMMI i 1 1 1 1 II MM II II I Ml I! II 

or f 14 1 - 1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 
orf 14 lng- 1 . pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 

I II II 1 1 1 M I II I II M 1 1 1 M llllhlllllllll hlllllllllllllllll 

orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
orf 14 lng- 1 . pep WGILGIVV^LAVLVLLAFNPQRFQDNLWLLPPLALFGAAQLDSLRRGAAAFVNWFGII^ 

: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 ! M M 1 1 1 1 1 1 1 ! 1 1 1! 1 1 1 1 1 M 1 1 1 1 M M 

WGILGVVV^LAVLVLLAWPQRFQDNLWLLPPIALFGAAQLDSLRRGAAAFVNWFGIMA 



orf 141-1 



orf 141ng-l .pep 



orf 141-1 



FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

Mill III II IIMIII II II III MM MM II MINI II I II 1 1 IN II MINI 

FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 



orf 14 lng- 1 .pep N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEAS FS PELKRELSDG I E 

MMMMMMIMM Illlllll MMMMMMMMMMMMMMIMM 

orf 141-1 N I RGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKS HAP WRSMEASLS PELKRELSDG IE 

orf 14 lng- 1 .pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

MIMII MMMMMMMMMMIMM I Mill III 1 1 II II I II II MM 

or f 14 1 - 1 CIGIGGGDLHTRI VWTQYGTLPHRVGDVQCRYRI VLLPQNADAPQGWQTVWQGARPRNKD 
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orf 141ng-l .pep SKFALIRKIGENILKTTDX 

MMMMIMM 
orf 14 1 - 1 SKFALIRKIGENIX 
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Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 72 
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The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 603): 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT GTAGTCGGCA CAGCAATTGG 
51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 
101 CCGGCCGCGC ATTGAAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 
5 151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 604; ORF142): 

1 . . QSAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
51 SGFQVGYTF* 

10 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 605): 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

15 151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

2 01 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 
251 CAGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAT 
301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

3 51 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 
20 401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 
501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 
551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 
601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

25 ■ 651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAATTGGG 

30 901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 606; ORF142-1): 



35 1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

40 251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. gonorrhoeae 



45 ORF142 (SEQ ID NO: 604) shows 88.1% identity over a 59aa overlap with a predicted ORF 
(ORF142ng) (SEQ ID NO: 608) from N. gonorrhoeae: 
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orfl42.pep QS AKWLS GQTL VGTAI G I RGQ I KLGGNLHY 30 

lllllll Nihil III Mil IMIIIIII 

orf 14 2ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orfl42.pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I I I I I I I I I I I : M - I I II I MM 
orfl42ng DI FTGRALKKPEYFQTKKWVTGFQVGYS F 342 

The complete length ORF142ng nucleotide sequence (SEQ ID NO: 607) is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT ATAATGGAAA AAGTTACAAC 

3 01 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

401 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

451 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGTATGAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

751 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 

1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 608): 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRSIGG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TADFKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

2 01 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

2 51 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 

ORF142ng (SEQ ID NO: 608) and ORF142-1 (SEQ ID NO: 606) show 95.6% identity over 342aa 
overlap: 

orf 142 - 1 . pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 

M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 

orfl42ng-l MDNSGS EATGKYQGN I TFSADNP FGLSDM F YVNYGRS I GGTPDE ENFDGHRKEGGSNNYA 

orf 142 - 1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 

II II I I M III II illll II li IMIMI II II I I ' I I II I I I I I I I I M II II llh 
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orf 142ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 
orf 142 - 1 . pep VKLWMRETKSYIDDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKRGTGMKDALRA 

1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 II h M 1 1 1 1 1 i I IMIIIIIMI IMIIIMMII 

orfl42ng-l VKLWTRETKSYiDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 
5 orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

■ 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 II 1 1 1 1 1 II I II M 1 1 1 II 1 1 1 1 1 1 M 1 1 1 i M 1 1 1 1 1 ! I M I 

orf 142ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 
or f 14 2 - 1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 

MINIM M M M II II I M 1 1 1 1 M M I II M I Ml M I M M M MMM M 1 1 

1 0 orf 142ng- 1 VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 .pep IRGQI KLGGNLHYDI FTGRALKKPEFFQSRKWASGFQVGYTF 

lllllll MIIMIMMIMMM MMMMIIMM 

orf 142ng- 1 I RGQ I KLGGNLHYD I FTGRALKKPEYFQTKKWVTGFQVGYS F 

15 In addition, ORF142ng (SEQ ID NO: 608) is homologous to the HecB protein (SEQ ID NO: 1 149) 
of E.chrysanthemi: 

gi | 1772622 (L39897) HecB [Erwinia chrysanthemi] Length = 558 
Score = 119 bits (295), Expect = 3e-26 

Identities = 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 

20 Query: 2 DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 

DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
DNSGOKSTGEEOLNGSLALDNVFGLADOWFI SAGHS SRFATSHDAESLQAG 280 



25 



35 



Query : 


2 


Sbjct : 


230 


Query : 


62 


Sbjct: 


281 


Query : 


122 


Sbjct: 


340 


Query : 


182 


Sbjct : 


400 


Query : 


242 


Sbjct: 


457 


Query : 


297 


Sbjct: 


516 



+S P+G W +N++ RY + G S F +R+ ++RD KT + + 



+Y++ + L RK + ++H + A F Y G 



30 + + + E + WT SA P Y S+ + Q++ L ++L +GG + + 

DKSADEPRAEFNKWTLSAS YYHPV TDS I TYLGSLYGQYSARALYGSEQLTLGGESS I 4 56 

RGFDGEMSLPAERGWYWRNDLSWQFKP GHQLYLGA- DVGHVSGQSAKWLSGQTLAG 2 96 

RGF E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 



A+G+ + L + G + P + Q V G++VG SF 



On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
40 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 73 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 609): 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCTTACTTG GgCGGACACC 

51 GCCGACATCG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

5 101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

3 01 AAAAGAAATA CCGGCTGCTG ATTAAGAACA AC. . 

10 

This corresponds to the amino acid sequence (SEQ ID NO: 610; ORF143): 

1 MRTKWSAVRS CTWADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 

101 KKYRLLIKNN . . 

15 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 61 1): 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

20 151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACATCGATA CCGCTTTGAA 

2 01 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

2 51 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG ■ 

3 01 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

3 51 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 
25 4 01 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCTGATT 

4 51 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 
501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 
551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 
601 ACTTTGGTAA GGATTTTATA CCGCCGTTAC AGCAACCGCG TGTAA 



30 



This corresponds to the amino acid sequence (SEQ ID NO: 612; ORF143-1): 



1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKXLTWADTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

35 151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG I PDLGKEAFV 

201 TLVRILYRRY SNRV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 



40 



ORF143 (SEQ ID NO: 610) shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) 
(SEQ ID NO: 614) from strain A of N. meningitidis: 
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10 20 30 

orf 143 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFL 

h : III 11111111:1 IMIIIIM 

orf 143a GAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTADIDTALNLLYRLQKLEFL 
5 20 30 40 50 60 70 

40 50 60 70 80 90 

or f 14 3 . pep YGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 

lllllllllllll llllllllllllllllllllllllllllllllllllllllllllll 

orf 143a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
10 80 90 100 110 120 130 

100 110 
orf 143. pep VAQMEKKYRLL I KNN 
llllllllll I I I I 

orf 14 3a VAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSEL fFFPLYIGSTKFILVIGG IPDLGKEA 
15 140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence (SEQ ID NO: 613) is: 

1 ATGGAATCAA CANTTTCACT ACAAGCAAAT TTATATCNCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGNCCCCAGT GCCGGTAAAA 

20 101 CTTTGTTGCA CAGCCTGTTG AAAGCGGATG CGGACGAAAT GGTNAGCAGT 

151 GAGAAGCTGC TTACCTGGGC GGANACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

25 351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCNNATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

30 601 ACTTTGGTAA GGATNTTATA CCNCCNGTTA CAGCAACCGC GTGTAAAACT 

651 TGGGAGAGAG GANGGGTTAT GCAGCAATTA TTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 614): 

1 MESTXSLQAN LYXRLTPAGA FYAVSSDXPS AGKTLLHSLL KADADEMVSS 

35 51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG I PDLGKEAFV 

2 01 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

40 ORF143a (SEQ ID NO: 614) and ORF143-1 (SEQ ID NO: 612) show 97.1% identity in 207 aa 
overlap: 

orf 143a. pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 

' 1 1 1 1 lllllll IIIIMIIIIIIII M II 1 1 M I i I II M 1 1 1 M 1 1 1 1 1 1 1 1 II 

orf 143 - 1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

45 orf 143a . pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

II II I I I I I I II II I ' II II III I II II II II II II I I ' I I Ml II I I I II II I I II I 
orf 143 - 1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 14 3a. pep NANFHHEAAE ELGLLAAEVAQMEKKYRLX I KNNLYINNNAWGVCDPSGQS ELTFFPLYIG 

III I II 1 1 III MM I II III I Mil II MIIIIIIIMMIIIIMIMI MM III 
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orf 143 - 1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
orf 143a . pep STKF I LV I GG I PDLGKEAFVTLVRXLY 

I I I I I I M M I I I II ; I I I I I I I II 

orf 143-1 STKFILVIGGI PDLGKEAFVTLVRI LY 

5 Homology with a predicted ORF from N. gonorrhoeae 

ORF143 (SEQ ID NO: 610) shows 95.5% identity over a llOaa overlap with a predicted ORF 
(ORF143ng) (SEQ ID NO: 616) from N. gonorrhoeae: 

orf 143 . pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 

Illlllllllh 1 1 1 1 M 1 1 1 1 1 1 1 ' 1 1 M I i I M I M I i 1 1 1 1 1 lllllllllll 

10 orf 143ng MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

orf 143 .pep SGSGKALLVDRNGLYLANANFHHEAAEELGLLAAEVAQMEKKYRLLIKNN 110 

I I I I I I M I I I I I I I Ml I I I : I II I I I I I I I I I I I I I I I I I hll 
orf 143ng SGSGKALLVDRNGLYLANANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 

15 An ORF143ng nucleotide sequence (SEQ ID NO: 615) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 61 6): 

1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT F FPLYIGSTK FILVIAGI PD 

20 151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

2 01 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGS.RSVQ ELACGELEQV 

2 51 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 617): 

1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

25 51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

30 301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

401 TGGCGGCAGA AGTCGCACAG ATGGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT ATATCAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

35 551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 618; ORF143ng-l): 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

40 51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LTF FPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 
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ORF143ng-l (SEQ ID NO: 618) and ORF143-1 (SEQ ID NO: 612) show 95.8% identity in 214 aa 
overlap: 



orf 143ng-l .pep MESTLSLQANLYPCLTPAGAPYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 59 

IIIIIIIIIIMI I I 1 I I I 1 I I t I 1 I I I I I I I I I I = I I I I I 1 I 1 I = I I 1 I I I I = I I I I 

orf 143 - 1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orf 143ng-l .pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 j i 1 1 i 1 1 1 1 f 1 1 1 1 

orf 143-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 12 0 

orf 143ng-l .pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 179 

1 1 1 1 1 h M 1 1 1 II 1 1 1 M II 1 1 Mil I H 1 1 1 M 1 1 1 1 1 M 1 1 1 ! 1 1 1 M 1 1 1 1 1 1 1 

orf 143 -1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 180 

orf 143ng-l .pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

I I I I I I I h I I I I I : I I I I I I I I I I I I I I I I I I I 
or f 1 4 3 - 1 STKFILVIGGI PDLGKEAF VTLVR I L YRR YSNRV 214 

Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 619): 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

2 01 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 
251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

3 01 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 
351 GACGATAGAC AATACGTTCA ACCGCATCTG G^CGGGTCAA wTyCCAGCGT 

4 01 CCGTGGATG . . 



This corresponds to the amino acid sequence (SEQ ID NO: 620; ORF144): 



1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMVFDYI NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM . . . 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 621): 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

2 01 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 
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251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

5 4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

10 701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC . CGTGCCGTTT 

751 TTTCTGTTGT 'GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTTCAGG AGTTCAGACG 

15 951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

20 12 01 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence (SEQ ID NO: 622; ORF144-1): 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 
51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 
25 101 TAIGSVMLW TSLMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT LLLWGLYRFV 
2 01 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFA AVPF 
251 FLLWLNLLWT LVLG GAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
301 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYTY SGRQGWVLKT 
30 3 51 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

35 ORF144 (SEQ ID NO: 620) shows 96.3% identity over a 136aa overlap with an ORF (ORF144a) 
(SEQ ID NO: 624) from strain A of TV. meningitidis: 

10 20 30 40 50 60 

orf 144 . pep MTFLQRLQGLADNKI CAFAW FWRRFDEERVPQXAASMTFTT LIiALVPVLTVMVAVAS I F 

III MIIIMIIIIMIIMIMI illllll MIIIIIIIIMI MM IMIIII 

40 orfl44a MTFLQRLQGLADNKICAFAW FVVRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVAS I F 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 144 . pep P VFDRWSDS FVS FVNQT I VPXGADMVFD Y I NAFREQANR LTA I GSVMLWTSLML I RT I D 
I I ! I I I I I I I I I I I II I I I I M I I I I I I M I I I I I I I I I I I I I I I II Illllll 
45 orfl44a P VFDRWSDS FVS FVNQT I VPQGADMVFD Y I NAFREQANR LTA I GSVMLVVTSXML I RT'I D 

70 80 90 100 110 120 

130 

orf 144 .pep NT FNR I WRVXXQRP WM 
II I III II I I I I I I 
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or f 1 4 4 a NTFNRIWRVNSQRPWMMQFLVYWA LLTFGPLSLGVGISFXV GSVQDAALASGAPQWSGAL 

130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence (SEQ ID NO: 623) is: 

5 1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC 'gCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

10 251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

3 51 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

15 501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

20 751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

25 1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACAGCAGCA ATCTTGA 
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This encodes a protein having amino acid sequence (SEQ ID NO: 624): 



1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYWALLTFGP 

35 151 LSLGVGISFX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAF AAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

* 351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

40 4 01 QAKKQQQS* 

ORF144a (SEQ ID NO: 624) and ORF144-1 (SEQ ID NO: 622) show 97.8% identity in 406 aa 
overlap: 



45 



orf 144a . pep MT FLQRLQGLADNK I CAFAWFVVRRFDEERVPQAAASMTFTTLLALVPVL TVMVAVASIF 

M I II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M I M M i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 i I I 

orf 144 - 1 MTFLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVAS I F 

orfl44a.pep P VFDRWSDS FVS FVNQT I VPQGADMVFD Y I NAFREQANRLTA I GS VML WTSXML I RT I D 

I h 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I Mill 1 1 1 1 1 lllllll 

orf 144 - 1 P VFDRWSDS FVS FVNQT I VPQGADMVFDY INAFREQANRLTAIGS VMLWTS LML I RT ID 



50 



orf 144a .pep 



NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 

I III INI MM I II MM I II INI Mil III III I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
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orf 144 - 1 NTFNRIWRWSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 
orf 144a . pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 

lllllhllllllMIII lllllllll 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 

orf 144 - 1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 
orf 144a . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 

Illllll llllllll'llllllll IIIIIMhllMIM IIIIIIIIIIIMIIII 

or f 14 4 - 1 I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLS YWQGEAFRRGFDSRGRFDDVLKILLLL 

orf 144a . pep DAAQKEGXALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 

M I E 1 1 1 IMIMI IIIMIII illlllllllllMlllillMII lllllllll 

orf 144-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 
orf 144a. pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 408 

lllllllllllllll llllll ' 1 1 1 1 1 1 M 1 1 1 1 1 M I M--I 

orf 144 - 1 FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 4 06 



Homology with a predicted ORF from N. gonorrhoeae 



15 ORF144 (SEQ ID NO: 620) shows 91.2% identity over a 136aa overlap with a predicted ORF 
(ORF144ng) (SEQ ID NO: 626) from N. gonorrhoeae: 



orf 144 .pep 

orf 144ng 
20 ■ orf 144. pep 

orf 144ng 

orf 144 . pep 
25 orfl44ng 

The complete length ORF144ng nucleotide sequence (SEQ ID NO: 625) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 626): 



1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 627): 

40 1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 

51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 



MTFLQRLQGLADNKICAFAWFVTORFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 6 0 

Mill II lllllllllllhllhllllll 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 h 

. MTFLQCWQGS ADNKI CAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVAS IF 60 

PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQANRLTAIGSVMLWTSLMLIRTID 12 0 

Illlllllllllllllllll I I I ! I I I I h I I h I I I I I II hi I h I I I I I I I I h 

PVFDRWSDSFVSFVNQTI VPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 12 0 

NTFNRIWRVXXQRPWM 13 6 
hi MINI UN II 

NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 180 



30 
35 
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151 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 

301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 

401 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 

451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 

501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 

551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 

651 GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 

701 ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 

751 TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 

801 GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 

1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 

1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 

1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence (SEQ ED NO: 628; 
ORF144ng-l): 



1 MTFLQRWQGL ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWA LLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

2 01 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFAAVPF 

2 51 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
301 DAAQKEGRTL SVQEFRRHIN ' MGYDELGELL EKLARYGYIY SGRQGWVLKT 

3 51 GADS I ELS EL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKQQQS * 

ORF144ng-l (SEQ ID NO: 628) and ORF144-1 (SEQ ID NO: 622) show 94.1% identity in 406 aa 
overlap: 



orf 144ng- 1 . pep MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

MINI 1 1 II III Mill III: llh IIIIIMIMI I III I III mil III II III I 

orf 144 - 1 MTFLQRLQGLADNKI CAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVAS I F 

orf 144ng- 1 . pep PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 

1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 II 1 1 1 M hi h 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 

orf 144-1 PVFDRWSDS FVS FVNQT I VPQGADMVFDY INAFREQANRLTAI GS VMLWTSLML I RT I D 

orf 144ng- 1 . pep NAFNRIWRVNTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLS SGAQQWADAL 
hill llllhll hlllllhhhhlhhlhhl llll-h II Ih II 
orf 144 - 1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 144ng- 1 . pep KTAARLAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 

hi I 'h II 1 1 1 1 ; 1 1 1 1 1 1 1 1 1 1 h I h 1 1 h I II III Ml II II I II II III II 

orf 144-1 RTAATLTFMTLLLWGLYRFVPNRF VP ARQAFVGALATAFCLETARSLFTW YMGNFDGYRS 



orf 144ng- 1 . pep I YGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 
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orf 144-1 




orf 144ng-l .pep DAAQKEGRTLSVQEFRRHINMGYDELGELLEKLARYGYIYSGRQGWVLKTGADSIELSEL 



orf 144ng-l .pep FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKQQQS 



On this basis of this analysis, including the identification of several putative transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 



Example 75 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 629): 



1 . .AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCACCGATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

201 CCTGCTTGAA ACACGGGAAC ACGGCTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 630; ORF146): 



1 . . RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 631): 



1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

2 01 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

2 51 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

3 01 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 
351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

4 01 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 
4 51 CTCATGCGCG CCATGAACGT GCTCATCGGC GCGGCCATCG CCATCGCCGC 
501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 
551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 
601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA" 
651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 
701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 
751 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 
801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 
901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 



orf 144-1 




orf 144-1 




raising antibodies. 
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951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 
1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 
1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

5 

This corresponds to the amino acid sequence (SEQ ID NO: 632; ORF146-1): 

1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 
51 EWIGMTVFW LGMLQFQGAI YSKAVERMLG TVIGLGAGLG VLWLNQHYFH 
101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 
10 151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 
251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 
301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 
351 TRRKWLDAHE RQHLRQSLLE TREHG* 

15 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 (SEQ ID NO: 630) shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) 
(SEQ ID NO: 634) from strain A of N. meningitidis: 

20 10 20 30 

orf 14 6 .pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

lllllll IIIIIMIIIII lllllll 
orf 146a KLNGS E I RLLDRHFTLLQTDLQQTVAL I NGRHARR I RI DTAI NPELEALAEHLHYQWQGF 

280 290 300 310 320 330 

25 40 50 60 70 

orf 146 . pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I I II :| I I I I I I I I I I I I I I I II I I M II I I I I M I I I II I II: 
orf 146a LWLSTNMRQE I SALVI LLQRTRRKWLDAHERQHLRQSLLETREHSX 

340 350 360 370 



30 



The complete length ORF146a nucleotide sequence (SEQ ID NO: 633) is: 



1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

35 151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

40 401 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

45 651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
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851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 634): 



1 MNTSQRNRLV SRWLNSYERY RYRRLIHAVR LGGAVLFATA SARLLHLQHG 

10 51 EWIGMTVFW LGMLQFQGAI YSKAVERMLG TVIGLGAGLG VLWLNQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

15 301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

ORF146a (SEQ ID NO: 634) and ORF146-1 (SEQ ID NO: 632) show 99.5% identity in 374 aa 
overlap: 



20 orf 146a . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 T 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 

or f 14 6-1 MNTSQRNRLVSRWLNS YERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 

orf 146a . pep LGMLQFQGAI YSKAVERMLGT VI GLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

llllll IIIIIIM MINIM MMIIII llllllllllllllllllllllll 

25 or f 14 6 - 1 LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orf 146a . pep VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

I I II I I I ' I I I I II I I I I I I M I I I h I I I I I I I ■ I M I I I I I I I M I I I I I I I I I I I M 
orf 146-1 VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 146a . pep FMLADNLTDC S KM I AE I SNGRRMTRERLEENMAKMRQ INARMVKSRSHLAATSGESRI SP 
30 | | | | | | | : | | | M | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 

orf 146-1 FMLADNLADCS KM I AE I SNGRRMTRERLEENMAKMRQ INARMVKSRSHLAATSGESR I S P 

orf 146a . pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSE I RLLDRHFTLLQTD LQQTVALING 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
orf 146 - 1 AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

35 orf 146a . pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I ! I I I I I I I I I I I I I I I I I 
orf 146-1 RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 146a. pep RQHLRQSLLETREHSX 

III llllll MM h 
40 orfl46-l RQHLRQSLLETREHGX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF146 (SEQ ID NO: 630) shows 97.3% identity over a 75aa overlap with a predicted ORF 
(ORF146ng) (SEQ ID NO: 636) from N. gonorrhoeae: 



CHIR-0160 (356.001) 



-448- 



PATENT 



orf 146 .pep RHARRIRIDTAINPELEALAEHLHYQWQGF 30 

I I M I I I I I I M M I I I I I I I I ! I I 
orf 146ng KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 

orf 146 .pep LWLSTDMRQE I S ALV I LLQRTRRKWLDAHERQHLRQS LLETREHG 75 

I I I I : I I II I M I I I I I I I I I I I I I I I I II I I 1 I I I I I I I I ! 
or f 1 4 6 ng LWLSTNMRQE I S ALV I PLQRTRRKWLDAHERQHLRQS LLETREHG 4 09 

An ORF146ng nucleotide sequence (SEQ ID NO: 635) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 636): 

1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDMNSSQR KRLSGRWLNS 

51 YERYRHRRLI HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWLN Q HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

201 AAKLLPL KST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

4 01 S LLETREHG* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 637): 

1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

401 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 638; ORF146ng-l): 

1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EWIGMTVFW LGMLQFQGAI YSNAVERMLG TVIGLGAGLG VLWLNQHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAM NVLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 
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351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l (SEQ ID NO: 638) and ORF146-1 (SEQ ID NO: 632) show 96.5% identity in 375 aa 
overlap 

5 orf 146-1. pep 

orfl46ng-l 
orf 146-1 .pep 
10 orfl46ng-l 

orf 146-1 .pep 
orf 146ng-l 
orf 146-1 .pep 

15 

orf 146ng-l 

orf 146-1 .pep 

orf 146ng-l 
20 orfl46-l.pep 

orf 146ng-l 

orf 146-1 .pep 
25 orfl46ng-l 

Furthermore, ORF146ng-l (SEQ ID NO: 638) shows homology with a hypothetical Exoli protein 
(SEQ ID NO: 1150): 

sp|P33 01l|YEEA_ECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBU-SBMC INTERGENIC REGION 
30 )gi|l736674|gnl|PID|dl016553 (D90838) ORF_ID : o348#20 ; similar to [SwissProt 

Accession Number P33011] [Escherichia coli] ) gi | 1736682 | gnl | PID | dl016560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
)gi | 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 
has 203 additional C-terminal residues [Escherichia coli] Length = 352 
35 Score = 109 bits (271) , Expect = 2e-23 

Identities = 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 





Query : 


20 


YRHRRL I HAWLGGTVLFATALARLLHLQHGEW IGMTVFVVLGMLQFQGAI YSNAVERML 


79 








YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 






Sbjct : 


15 


YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNWPRAFERIG 


74 


40 


Query : 


80 


GTV I GLGAGLGVLWLNQH YFHGNLL FYLT I GTAS ALAGWAAVGKNGYVPMLAGLTMCML I 


139 








GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 






Sbjct: 


75 


GTVLGS I LGL I ALQLE - - -LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIW 


131 



MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

Ihllhll :| I I I I U M I I I I I I M II U . I I I I IIIIIIIIIIMIII I 
MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFW 

LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

llllll Mill M II IMIIIMIIIIIMM IMM MM IMIIMM IIMIIMI 

LGMLQFQGA I YSNAVERMLGTV I GLGAGLGVLWLNQH YFHGNLL FYLT I GTAS ALAGWAA 
VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

I II II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 1 

VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 
FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

I M I M I MM I II I II 1 1 1 1 : 1 1 = 1 1 MM M I M 1 1 1 1 1 II I II 1 1 1 1 

FMLADNLADCSKMIAEISNGRRMTRERLEQNMVKMRQINARMVKSRSHLAATSGESRISP 
AMMEAMQHAHRK I VNTTELLLTTAAKLQS PKLNGS E I RLLDRH FTLLQTDLQQTVAL I NG 

M 1 1 II II II 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M I II 1 1 1 II II M I II 1 1 1 1 1 1 1 1 Ml M I 

SMMEAMQHAHRK I VNTTELLLTTAAKLQS PKLNGS EI RLLDRH FTLLQTDLQQTAAL I NG 
RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

MIMMMIMIMM llllllllllllillllllll MM MIMMMIMM 

RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

RQHLRQSLLETREHGX 

Illlllllllllllll 
RQHLRQSLLETREHGX 



Query : 



14 0 GDNGS EWLDSGLMRAMNVL I GXXXXXXXXKLLPLKSTLMWRFMLADNLADCS KM I AE I SN 
G E +D+ L R+ +V++G + P ++. + WR LA +L + + + + + 



199 
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Sbjct: 132 GSPTGE- IDTALWRSGDVILGSLLAMLFTGIWPQRAFIHWRIQLAKSLTEYNRVYQSAFS 190 

Query: 200 GRRMTRERLEQNMVKMRQ I NARMVKSRSHLAATSGESR I S PSMMEAMQHAHRKI VNXXXX 259 

+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

Sbjct: 191 PNLLERPRLESHLQKLL TDAVKMRGL I APAS KETR I PKS I YEG I QT INRNLVCMLEL 247 

Query: 260 XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR- -DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

Query: 317 EALAEHL - - HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 76 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 639) 



1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 

51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 

101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

2 01 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 

2 51 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 
301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

3 51 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

4 01 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 
4 51 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 
501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 
551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 
601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 
651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 
701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence (SEQ ID NO: 640; ORF147): 



1 . . AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMWAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 641): 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC. 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 
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201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

3 01 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence (SEQ ID NO: 642; ORF147-1): 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGF KV VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

2 01 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical protein ORF286 (SEQ ID NO: 1151) of E.coli (accession number 
U 18997) 



ORF147 (SEQ ID NO: 640) and E.coli ORF286 protein (SEQ ID NO: 1151) show 36% aa identity 
in 237aa overlap: 



Or f 14 7: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 102 

0rfl47: 61 AKIARRWEXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 120 

L R RE F + GF+P KS RR 

0rf286: 103 YHLVRTCREAGI RWPLPGPCAA I TALS AAGLPSDRFCYEGFLPAKS KGRRDALKAI EAE 162 

Orfl47: 121 AFPIVMFETPHRIGAALADHAELFPERR-L^LAREITKTFETFLSGTVGEIQTALSADGD 179 

+ + +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 222 

0rfl4 7: 180 QSRGEMVLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N. meningitidis (strain A) 
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ORF147 (SEQ ID NO: 640) shows 96.6% identity over a 237aa overlap with ORF75a (SEQ ID 
NO: 290) from strain A of N. meningitidis: 

10 20 30 

orf 147 . pep AEDTRVTAQLLS AYG I QGKLVS VREHNERQ 

5 , I I I I I I I I I I II T I I I I I I I I I M I I I I 

O r f 7 5 a TLYWAT P I GNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS A YG I QGKLVS VREHNERQ 

20 30 40 50 60 70 

40 50 60 ' 70 80 90 

orf 147 .pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKIjARRVREAGF KVVPVVGAXAVMAALSVA 

10 Mill NIMH IIIIIIIIIIIIIMIIIIII IIIIIMIIIIMIII III I II III 

orf 75a MADKIVGYLSDGMWAOVSDAGTPAVCDPGAKLARRVREVGF KWPVVGASAVMAALSVA 
80 90 100 110 120 130 

100 110 120 130 140 150 

orf 14 7 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 

15 || 1 1 M 1 1 . II 1 1 1 1 1 1 1 ! 1 1 1 M 1 1 1 h 1 1 h 1 1 1 1 1 1 1 i 1 1 h I M 1 1 1 1 1 1 1' 1 1 1 

orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 

160 170 180 190 200 210 

orf 14 7 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 

20 Ml | Ml 11 1 Mill 1 1 Ml 1 1 1 hill: III III I II MM I MM II II III I Mill 

orf 75a LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
200 210 220 230 240 250 

220 230 
orf 14 7. pep LTAELPTKQAAELAAKITGEGKKALYD 
25 | | | || | | || | | | | | || | | | || | | | | || 

orf 75a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 
260 270 280 290 

ORF147a is identical to ORF75a (SEQ ID NO: 290), which includes aa 56-292 of ORF75 (SEQ ID 
30 NO: 286). 

Homology with a predicted ORF from N. gonorrhoeae 

ORF147 (SEQ ID NO: 640) shows 94.1% identity over a 237aa overlap with a predicted ORF 
(ORF147ng) (SEQ ID NO: 644) from N. gonorrhoeae: 

orf 147. pep AEDTRVTAQLLS AYG I QGKLVS VREHNERQ 3 0 

35 I I I I I I I I I I I I I I M I I II I I I I I I 

O r f 1 4 7 ng TLYWAT P I GNLAD I TLRALAVLQKAD 1 1 CAEDTRVTAQLLS AYG I QGRLVS VREHNERQ 8 5 

or f 14 7 . pep MADKI VGYLSDG1WVAQVSDAGTPAVCDPGAKLARRVREAGFKVVPVVGAXAVMAALSVA 9 0 

I I I I I : I I I I : I I I I I I I I I E I I I I I I I I I I I I I I I I I I I I I I I I I I I MIIMIM 
orf 14 7ng MADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGASAVMAALSVA 14 5 

40 orf 14 7 .pep GVEGSDFYFNGFVPPKSGERRKIjFAKWTOAAFPIVMFETPHRIGAALADMAELFPERRLM 150 

ii 1 1 1 1 1 1 1 1 1 j 1 1 1 1 l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 [ 1 1 1 1 1 1 1 1 = 1 1 r 1 1 1 1 1 1 1 1 1 1 1 

orf 14 7ng GVAESDFYFNGFVPPKSGERRKIjFAKWWAAFPVVMFETPHRIGATLADMAELFPERRLM 205 
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orfl47 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 

M I II I I II I I I I M I I I I I M lh ■ Ihl I I I I I I I I ! I I I I I I I I M I I II I III 
orf 147ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

orf 147 .pep LTAELPTKQAAELAAKI TGEGKKALYD 237 

M 1 1 1 1 1 1 1 U II i I M 1 1 1 1 1 1 1 

orf 14 7ng LAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 3 00 

An ORF147ng nucleotide sequence (SEQ ID NO: 643) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 644): 



1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADI ICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFK W PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 645): 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

3 01 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

•351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 646; ORF147ng-l): 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCD PGAKLA RRVREAGF KV VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 



ORF147ng-l (SEQ ID NO: 646) shows homology to a hypothetical E.coli protein (SEQ ID NO: 
1152): 



Sp | P4 5 52 8 | YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

)gi | 606086 (U18997) 0RF_f286 [Escherichia coli] 



CHIR-0160 (356.001) 



-454- 



PATENT 



)gi | 1789535 (AE000395) hypothetical 31.3 kD~ protein in agai-mtr intergenic region 
[Escherichia colij' Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

Query: 4 KHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 

K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
Sbjct: 2 KQHQS ADNS Q - - GQL Y I VPT P I GNLAD I TQRALE VLQAVDL I AAEDTRHTGLLLQH FG I N 59 

Query: 64 GRLVSWEHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
Sbjct: 60 ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

Query: 124 VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPVVMFETPHRIGATL 183 

G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 

Sbjct: 120 PGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAEPRTLIFYESTHRLLDSL 179 

Query: 184 ADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEK 242 

D+ + E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

Sbjct: 180 EDIVAVLGESRYWIJUIELTKTWETIHGAPVGELLAWVKEDENRRKGEMVLIV-EGHKAQ 238 

Query: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 

EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 23 9 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 2 82 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 647) 



1 


ATGAAAACAA 


CCGACAAACG 


51 


AACCGGTCGC 


ATCCGCTTCT 


101 


TCGGCATTCT 


TCCCCAAGCC 


151 


TACCAATACT 


ATCGCGACTT 


201 


GGCGAAAGAT 


ATTGAGGTTT 


251 


CAATGACAAA 


AGCCCCGATG 


301 


GTGGCGGcAT 


TGGTGGGCGt 


351 


GCGGCTATAA 


CAACGTTGAT 


401 


CAACAwCGww 


TTACTTATAA 


451 


GACTAAAGGC 


CATCCTTATG 


501 


AATwTGTCAC 


AGATGCAGAA 


551 


CGGAAATATA 


TCGATCAAAA 


601 


AGGCAGGCAA 


TATTGGCGAT 


651 


GTTCATATCA 


TATTGCAAGT 


701 




GGCTC 


751 


AAAGTGGTTA 


ATTAATGGGG 


801 


AAAGCAATGG 


CTTCCAGCTG 


851 


TTTGCTGGAG 


ATACCCATTC 


901 


ATACTCTTTT 


AACGACGATA 


951 


. ATGAACACAA 


TTCTCTGCCT 


1001 


TTTAATGTTT 


CTTTATCCGA 


1051 


AGGTGGTGTC 


AACAGTTATC 



GACAACCGAA ACACACCGCA AAGCCCCGAA 
C . GCTGCTTA CTTAGCCATA TGCCTGTCGT 
TGGGCGGGAC ACACTTATTT CGGCATCAAC 
TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 
ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 
ATTGATTTTT CTGTGGTGTC GCGTAACGGC 
ATCAATATAT TGTGAGCGTG GCACATAACG 
TTTGGTGCGG AAGGAAk . AA tATCCC . GAT 
AATTGTGAAA CGGAATAATT ATAAAGCAGG 
GCGGCGATTA TCATATGCCG CGTTTGCATA 
CCTGTTGAAA TGACCAGTTA TATGGATGGG 
TAATTACCCT GACCGTGTTC GTATTGGGGC 
CTGATGAAGA TGAGCCCAAT AACCGCGAAA 



ACCAATGTTT ATCTATGATG CCCAAAAGCA 
TATTGCAAAC GGGCAACCCC TATATAGGAA 
GTTCGTAAAG ATTGGTTCTA TGATGAAATC 
AGTATTCTAC GAACCACGTC AAAATGGGAA 
ATAATGGCAC AGGAAAAATC AATGCCAAAC 
AATAGATTAA AAACACGAAC CGTTCAATTG 
GACAGCAAGA GAACCTGTTT ATCATGCTGC 
GACCCAGACT GAATAATGGA GAAAATATTT 
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1101 CCTTTATTGA CGAAGGAAAA GGCGAATTGA TACTTACCAG CAACATCAAT 

1151 CAAGGTGCTG GAGGATTATA TTTCCAAGGA GATTTTACGG TCTCGCCTGA 

1201 AAATAACGAA ACTTGGCAAG GCGCGGGCGT TCATATCAGT GAAGACAGTA 

1251 CCGTTACTTG GAAAGTAAAC GGCGTGGCAA ACGACCGCCT GTCCAAAATC 

1301 GGCAAAGGCA CGCTG 

// 

2101 GATAAAG 

2151 TGACTGCTTC ATTGACTAAG ACCGACATCA GCGGCAATGT CGATCTTGCC 

2201 GATCACGCTC ATTTAAATCT CACAGGGCTT GCCACACTCA ACGGCAATCT 

2251 TAGTGCAAAT GGCGATACAC GTTATACAGT CAGCCACAAC GCCACCCAAA 

2301 ACGGCAACCk TAgCCtCGtG G . sAATGcCC AAGCAACATT TAATCAAGCC 

2351 ACATTAAACG GCAACACATC GGCTTCgGGC AATGCTTCAT TTAATCTAAG 

24 01 CGACCACGCC GTACAAAACG GCAGTCTGAC GCTTTCCGGC AACGCTAAGG 

2451 CAAACGTAAG CCATTCCGCA CTCAACGGTA ATGTCTCCCT AGCCGATAAG 

2501 GCAGTATTCC ATTTTGAAAG CAGCCGCTTT ACCGGACAAA TCAGCGGCGG 

2551 CAagGATACG GCATTACACT TAAAAGACAG CGAATGGACG CTGCCGTCAg 

2601 GarCGGAATT AGGCAATTTA AACCTTGACA ACGCCACCAT TACaCTCAAT 

2651 TCCGCCTATC GCCACGATGC GGCAGGGGCG CAAACCGGCA GTGCGACAGA 

2701 TGCGCCGCGC CGCCGTTCGC GCCGTTCGCG CCGTTCCCTA TTATmCGTTA 

2751 CACCGCCAAC TTCGGTAGAA TCCCGTTTCA ACACGCTGAC GGTAAACGGC 

2 801 AAATTGAACG GTCAGGGAAC ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA 

2 851 CCGCAGCGAC AAATTGAAGC TGGCGGAAAG TTCCGAAGGC ACTTACACCT 

2 901 TGGCGGTCAA CAATACCGGC AACGAACCTG CAAGCCTCGA ACAATTGACG 

2 951 GTAGTGGAAG GAAAAGACAA CAAACCGCTG TCCGAAAACC TTAATTTCAC 

3 001 CCTGCAAAAC GAACACGTCG ATGCAGGCGC GTGG 

// 

3551 TTAGAC CGCGTATTTG CCGAAGACCG 

3 601 CCGCAACGCC GTTTGGACAA GCGGCATCCG GGACACCAAA CACTACCGTT 

3651 CGCAAGATTT CCGCGCCTAC CGCCAACAAA CCGACCTGCG CCAAATCGGT 

3 701 ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC GGCATCCTGT TTTCGCACAA 

3751 CCGGACCGAA AACACCTTCG ACGACGGCAT CGGCAACTCG GCACGGCTTG 

3801 CCCACGGCGC CGTTTTCGGG CAATACGGCA TCGACAGGTT CTACATCGGC 

3 851 ATCAGnCGCG GGCGCGGGTT TTAGCAGCGG CAGCCTTTcA GACGGCATCG 

3 901 GAGsmAAAwT CCGCCGCCGC GTGCtGCATT ACGGCATTCA GGCACGAtAC 

3 951 CGCGCCGgtt tCggCGgATt CGGCATCGAA CCGCACATCG GCGCAACGCg 

4 001 ctATTTCGTC CAAAAAGCGG ATTACCGCTA CGAAAACGTC AATATCGCCA 
4051 CCCCCGGCCT TGCATTCAAC CGcTACCGCG CGGGCATTAa GGCAGATTAT 
4101 TCATTCAAAC CGGCGCAACA CATTTCCATC ACGCCTTATT TGAGCCTGTC 
4151 CTATACCGAT GCCGCTTCGG GCAAAGTCCG AACACGCGTC AATACCGCCG 
4201 TATTGGCTCA GGATTTCGGC AAAACCCGCA GTGCGGAATG GGgCGTAAAC 
4251 GCCGAAATCA AAGGTTTCAC GCTGTCCCTC CACGCTGCCG CCGCCAAAGG 
4301 CCCGCAACTG GAAGCGCAAC ACAGCGCGGG CATCAAATTA GGCTACCGCT 
4351 GGTAA. . . 

This corresponds to the amino acid sequence (SEQ ID NO: 648; ORF1): 



i 

51 
101 
151 
201 
251 
301 
351 
401 

701 
751 
801 
851 



MKTTDKRTTE 
YQYYRDFAEN 
VAALVGVQYI 
TKGHPYGGDY 
GRQYWRSDED 
KWLINGVLQT 
YSFNDDNNGT 
GGVNSYRPRL 
NNETWQGAGV 



THRKAPKTGR 
KGKFAVGAKD 
VSVAHNGGYN 
HMPRLHKXVT 
EPNNRESSYH 
GNPYIGKSNG 
GKINAKHEHN 
NNGENISFID 
HISEDSTVTW 



SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 



.... DKVTAS 
SHNATQNGNX 
LSGNAKANVS 
EWTLPSGXEL 



IRFXAAYLAI 
IEVYNKKGEL 
NVDFGAEGXN 
DAEPVEMTSY 

IAS 

FQLVRKDWFY 
SLPNRLKTRT 
EGKGELILTS 
KVNGVANDRL 
II 

LTKTDISGNV 
SLVXNAQATF 
HSALNGNVSL 
GNLNLDNATI 



CLSFGILPQA 
VGKSMTKAPM 
IXDQXRXTYK 
MDGRKYIDQN 

GS 

DEIFAGDTHS 
VQLFNVSLSE 
NINQGAGGLY 
SKIGKGTL . . 



WAGHTYFGIN 
IDFSWSRNG 
IVKRNNYKAG 
NYPDRVRIGA 
PMFIYDAQKQ 
VFYEPRQNGK 
TARE P VYHAA 
FQGDFTVSPE 



DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 



TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
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901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 
951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

1151 ' LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 
1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 
1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 
1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 
14 01 LAQDFGKTRS AEWGVNAE I K GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 
1451 * 

Further sequencing analysis revealed the complete nucleotide sequence (SEQ ID NO: 649): 



1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

3 01 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 
351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

4 01 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 
451 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 
501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 
551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 
601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 
651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 
701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 
751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 
801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 
851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 
901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 
951 CCATTCAGTA TTCTACGAAC CACGTCAAAA TGGGAAATAC TCTTTTAACG • 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACGAA 

12 01 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

12 51 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

13 01 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 
1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

14 01 GCACGTTCAA GCCAAAGGGG AAAACCAAGG CTCGATCAGC GTGGGCGACG 
14 51 GTACAGTCAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 
1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 
1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 
1601 GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 
1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 
1701 TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 
1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 
1801 ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 
1851 AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 
1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 
1951 TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 
2001 GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 
2051 ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 
2101 GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 
2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 
2201 TGACAAATTG TGTCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 
2251 TTGACTAAGA CCGACATCAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 
23 01 TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 
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23 51 GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 

5 2551 CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

2751 CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 

10 2801 GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 

2851 TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATTGAACGG 

2901 TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 

2951 AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 

■ 3001 AATACCGGCA ACGAACCTGC AAGCCTCGAA CAATTGACGG TAGTGGAAGG 

15 3051 AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 

3101 AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 

3151 GAGTTCCGCC TGCATAATCC GGTCAAAGAA CAAGAGCTTT CCGACAAACT 

3201 CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 

32 51 TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 

20 33 01 GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 

3351 GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 

34 01 CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 

34 51 GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 

3501 CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 

25 3551 AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 

3601 CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 

3651 GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 

3701 CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 

3751 GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 

30 3801 CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 

3851 TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 

3 901 AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 

3 951 CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 

4 001 CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 
35 4 051 GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 

4101 GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 

4151 CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 

4201 ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 

4251 TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 
40 4301 • ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 

4351 ATCAAATTAG GCTACCGCTG GTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 650; ORF1-1): 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

45 51 YQYYRDFAEN KGKFAVGAKD I EVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

50 3 01 QLVRKDWFYD E I FAGDTHS V FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 

351 LPNRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDE 

401 GKGELILTSN INQGAGGLYF QGDFTVSPEN NETWQGAGVH ISEDSTVTWK 

4 51 VNGVANDRLS KIGKGTLHVQ AKGENQGSIS VGDGTVILDQ QADDKGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

55 551 DEGAMIVNHN QDKESTVTIT GNKDIATTGN NNSLDSKKEI AYNGWFGEKD 

601 TTKTNGRLNL VYQPAAEDRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLNDHWSQ KEGIPRGEIV WDNDWINRTF KAENFQIKGG QAWSRNVAK 

701 VKGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTNCVEK TITDDKVIAS 

751 LTKTDISGNV DLADHAHLNL TGLATLNGNL SANGDTRYTV SHNATQNGNL 
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801 SLVGNAQATF NQATLNGNTS ASGNASFNLS DHAVQNGSLT LSGNAKANVS 
851 HSALNGNVSL ADKAVFHFES SRFTGQISGG KDTALHLKDS EWTLPSGTEL 
901 GNLNLDNATI TLNSAYRHDA AGAQTGSATD APRRRSRRSR RSLLSVTPPT 
951 SVESRFNTLT VNGKLNGQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN 
5 1001 NTGNEPASLE QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG 

1051 EFRLHNPVKE QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAVEKTES 
1101 VAEPARQAGG ENVGIMQAEE EKKRVQADKD TALAKQREAE TRPATTAFPR 
1151 ARRARRDLPQ LQPQPQPQPQ RDLISRYANS GLSEFSATLN SVFAVQDELD 
1201 RVFAEDRRNA VWTSGIRDTK HYRSQDFRAY RQQTDLRQIG MQKNLGSGRV 
10 1251 GILFSHNRTE NTFDDGIGNS ARLAHGAVFG QYGIDRFYIG ISAGAGFSSG 

13 01 SLSDGIGGKI RRRVLHYGIQ ARYRAGFGGF GIEPHIGATR YFVQKADYRY 
1351 ENVNIATPGL AFNRYRAGIK ADYSFKPAQH ISITPYLSLS YTDAASGKVR 

14 01 TRVNTAVLAQ DFGKTRSAEW GVNAE I KGFT LSLHAAAAKG PQLEAQHSAG 
14 51 IKLGYRW* 

15 Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF1 (SEQ ID NO: 648) shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) 
(SEQ ID NO: 652) from strain A of N. meningitidis: 



10 20 30 40 50 60 

20 orf 1 . pep MKTTDKRTTETHRKAP KTGR I RFXAAYLA ICLSFGI L PQAWAGHT YFG I N YQ Y YRDFAEN 

Illllllllllllllllllllll 1 1 1 1 : 1 II 1 1 1 : 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 M 1 1 

or f la MKTTDKRTTETHRKAP KTGR I RFS PAYLAICLS FGI L PQAWAGHTYFG IN YQY YRDFAEN 

10 20 30 40 50 60 



70 80 90 100 110 120 

25 orf 1 .pep KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALVGVQY I VS VAHNGGYN 

1 1 1 1 1 1 1 1 1 II 1 1 1 1 i 1 1 1 1 IM 1 1 1 1 M 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 lllllllllllll 

orf la KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFSWSRNGVAALVGDQY I VS VAHNGGYN 

70 80 90 100 110 120 



130 140 150 160 170 180 

30 orf 1 . pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

Illlllllll II I :|:|lllllll llhll lllllll lllllllllll 

orf la NVDFGAEGXN- PDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 



190 200 210 
35 orf 1 .pep MDGRKY I DQNN YPDRVR I GAGRQ YWRSDEDE P NN 

I I I I- : I I : I M I : I - I I hh II 
orf la MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 
180 190 200 210 220 230 



220 230 240 250 260 

40 orf 1. pep RESSYH IA SGS PMF I YDAQKQKWL INGVLQTGNP Y I GKSNGFQLVRK 

: || lllllllll -IIMIIMI II h llllhll 

orf la SGDVRHANDYGPMP I AGAAGDSGS PMF I YDKTNNKWLLNGVLQTGYPYSGRENGFQL IRK 

240 250 260 270 280 290 



270 280 290 300 310 320 

45 orf 1 . pep DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

11111 = 1= MM :|lhlh:|h::MIII == =h I I H I = = I I = 1 I = 
orf la DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 

300 310 320 330 340 350 



CHIR-0160 (356.001) 



-459- 



PATENT 



330 340 350 360 370 380 

orf 1 . pep SLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYFQGDFT 

I hi I HIM II II I Mill II II MM I MM MM II MINIMI II 

orf la SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 
5 360 370 380 390 400 410 

390 400 410 420 430 

orf 1 . pep VSPENNETWQGAGraiSEDSTVTWKVNGVANDRLSKIGKGTL 

I I III II I I Ml I I I I I I I I I I I I I I I I I Ml I III III I 
orf la VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
10 420 430 440 450 460 470 



orf 1 .pep 



orf la VILDQQADDKGKKQAFSEIGIiXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
15 480 490 500 510 520 530 



orf 1 .pep 



orf la RIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 
20 540 550 560 570 580 590 



orf 1 .pep 



orf la TNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSKMEG 
25 600 610 620 630 640 650 



orf 1 .pep 



orf la IPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
30 660 670 680 690 700 710 

440 450 460 470 480 

orf 1 . pep XXXXXDKVTAS LTKTD I SGNVDLADHAHLNLTGLATLNGNLSAN 

II : III lllllll II I I hi hi Mill 

orf la T I CTRSDWTGLTNCVEXX I TDDKV I ASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLS AN 

35 720 730 740 750 760 770 

490 500 510 520 530 540 

orf 1 . pep GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNAS FNLSDHAVQNGSLTLSG 

MIMMMMMI Ml MMIIIMIIIMI III II I II h M = II 1 1 1 1 1 1 

or f la GDTRYTVSHNATQNGNLS LVGNAQATFNQATLNGNXSXSGNAS FNLSNNAAQNGSLTLSD 

40 780 790 800 810 820 830 



550 560 570 580 590 600 

orf 1 . pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 

1 1 1 1 1 1 1 II I III 1 1 1 M 1 1 1 1 II I M 1 1 1 M M I M I MM IMMIIIIIMIMI 

orf la NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
45 840 850 860 870 880 890 

610 620 630 640 650 660 

orf 1 . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 

MIMIIIIMM MM MMIIIIMI II 1 1 1 M I M II M 1 1 1 M I 

orf la NLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFNTLTVNG 

50 900 910 920 930 940 950 



CHIR-0160 (356.001) 



-460- 



PATENT 



670 680 690 700 710 720 

orf 1 . pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEGKDNKPL 

III I II II I II Ml I I I I M I I I MM I II I II II I I I I II M I MM M Ml II I I I I 
orf la KLNXQGTFRFMSELFGYRSDKIiKIAESSEGTYTLAVNNTGNEPVSLDQLTVVEGKDNKPL 
960 970 980 990 1000 1010 



10 



730 740 750 

or f 1 . pep SENLNFTLQNEHVDAGAW 

II lllllllllllllll 

orf la S ENLNFTLQNEHVDAGAWRYQL I RKDGE FRLHNP VKEQELS DKLGKAEAKKQAEKDNAQS 

1020 1030 1040 1050 1060 1070 



15 



orf 1 .pep 
orf la 



LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 



20 



760 

orfl.pep LDR 

III 

orf la XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 
1140 1150 1160 1170 1180 1190 



25 



770 780 790 800 810 820 

orf 1 . pep VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 

lllllllllllll ll MIMI MM MMMMMMMMIMM MM M 

orf la VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
1200 1210 1220 1230 1240 1250 



30 



830 840 850 860 870 880 

orf 1 . pep TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 

MM 1 1 1 M I II 1 1 1 1 1 II I II II Mlhlllllll MM I lllllllllll 

orf la XFDDG I GNS ARLiAHGAVFGQYGI GRFD IG I STGAGFSSGXLSDG I GGKI RRRVLHYG I QA 

1260 1270 1280 1290 1300 1310 



35 



890 ' 900 910 920 930 940 

orf 1 . pep RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 

MIMI MIMMIIMIIIMIIMI MM IMIMIMIIIIMIIMM I 

orf la RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 
1320 1330 1340 1350 1360 1370 



40 



950 960 970 980 990 1000 

orf 1 . pep S I TPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRS AEWGVNAE I KGFTLSLHAAAAKGP 

Mill I M 1 1 1 M 1 1 M II 1 1 M I II II M I M M I II I M M I M 1 1 II III II 

orf la S I TP YXSLSYTDAASGKVRTRVNTAVLAQDFGKTRS AEWGVNAE I KGFTLSXHAAAAKGP 

1380 1390 1400 1410 1420 1430 



45 



orf 1 . pep 
orf la 



1010 1020 
QLEAQHS AGI KLGYRWX 

1 1 M 1 1 1 I! 1 1 ; 1 1 1 1 

QLEAQHS AG I KLGYRWX 
1440 1450 



The complete length ORF1 a nucleotide sequence (SEQ ID NO: 651) is: 



1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 
51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA" CTTAGCCATA TGCCTGTCGT 
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101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

5 301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

401 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

451 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

10 551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

701 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

751 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 

15 801 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 

851 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 

901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 

951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

20 1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA ATCAGTACCG TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

1201 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 

1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

25 1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

1401 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

14 51 ACGATAAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

30 1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGATGAAG GGGCGATGAT TGNCNATCAT 

1651 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

1701 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 

1751 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGACCAAAAC GAACGGGCGG 

35 1801 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 

1851 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 

1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 

1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 

2001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 

40 2051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 

2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 

2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 

2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 

2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 

45 2301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 

2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 

24 01 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 

2451 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 

2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 

50 2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 

2601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 

2651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 

2701 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 

2 751 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 

55 2 801 TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 

2851 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 

2 901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 

2 951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 

3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 

60 3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 

3101 AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 
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3151 CAAGAGCTTT CCGACAAACT 

3201 AAAAGACAAC GCGCAAAGCC 

3251 CCGCCGAAAA GACAGAAAGC 

3301 GAAAATGTCG GCATTATGCA 

3351 GGATAAAGAC AGCGCNTTGG 

3401 NTACCACCGC CTTCCCCCGC 

34 51 CCGCAGCCCC AACCGCAACC 

3501 CCGTTATGCC AATAGCGGTT 

3 551 TTTTCGCCGT ACAGGACGAA 

3601 AACGCNGTTT GGACAAGCNG 

3651 AGATTTCCGC GCCTACCGCC 

3701 AGAAAAACCT CGGCAGCGGG 

3751 ACCGAAAACA NCTTCGACGA 

3801 CGGCGCCGTT TTCGGGCAAT 

3851 GCACGGGCGC GGGTTTTAGC 

3 901 AAAATCCGCC GCCGCGTGCT 

3 951 CGGTTTCGGC GGATTCGGCA 

4 001 TCGTCCAAAA AGCGGATTAC 
4051 GGTCTTGCGT TCAACCGNTA 
4101 CAAACCGGCG CAACACATNT 
4151 CCGATGCCGC TTCGGGCAAA 
4201 GCTCAGGATT TCGGCAAAAC 
4251 AATCAAAGGT TTCACGCTGT 
4301 AACTGGAAGC GCAACACAGC 



CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 
TTGACGCGCT GATTGCGGCC GGGCGCGATG 
GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 
GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 
CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 
GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 
TCAACCCCAA CCGCAGCGCG ACCTGATNAG 
TGAGTGAATT TTCCGCCACG CTCAACAGCG 
TTGGACCGCG TGTTTGCCGA AGACCGCCGC 
CATCCGGNAC ACCAAACACT ACCGTTCGCA 
AACAAACCGA CCTGCGCCAA ATCGGTATGC 
CGCGTCGGCA TCCTGTTTTC GCACAACCGG 
CGGCATCGGC AACTCGGCAC GGCTTGCCCA 
ACGGCATCGG CAGGTTCGAC ATCGGCATCA 
AGCGGCANTC TNTCAGACGG CATCGGAGGC 
GCATTACGGC ATTCAGGCAC GATACCGCGC 
TCGAACCGTA CATCGGCGCA ACGCGCTATT 
CGCTACGAAA ACGTCAATAT CGCCACCCCC 
CCGNGCGGGC ATTAAGGCAG ATTATTCATT 
CCATCACNCC TTATTTNAGC CTGTCCTATA 
GTCCGAACAC GCGTCAATAC CGCNGTATTG 
CCGCAGTGCG GAATGGGGCG TAAACGCCGA 
CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 
GCGGGCATCA AATTAGGCTA CCGCTGGTAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 652): 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

201 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 

251 MPIAGAAGDS GSPMFIYDKT NNKWLLNGVL QTGYPYSGRE NGFQLIRKDW 

301 FYDDIYRGDT HTVXFEPRSN GHFSFTSNNN GTGTVTETNE KVSNPKLKVQ 

351 TVRLFDESLN ETDKEPVYAA GGVNQYRPRL NNGENLSFID YGNGKLILSN 

401 NINQGAGGLY FEGDFTVSPE NNETWQGAGV HISEDSTVTW KVNGVANDRL 

451 SKIGKGTLHV QAKGENQGS I SVGDGTVILD QQADDKGKKQ AFSEIGLXSG 

501 RGTVQLNADN QFNPDKLYFG FRGGRLDLNG HSLSFHRIQN TDEGAMIXXH 

551 NATTTSTVTI TGNESITQPS GKNINRLNYS KE I AYNGWFG EKDTTKTNGR 

601 LNLVYQPAAE DRTXLLSGGT NLNGNITQTN GKLFFSGRPT PHAYNHLGSG 

651 WSKMEGIPQG EIVWDNDWIX RTFKAENFHI QGGQAVISRN VAKVEGDXHL 

701 SNHAQAVFGV APHQSHTICT RSDWTGLTNC VEXXITDDKV IASLTKTDXS 

751 GXVXLXXXXX XXLXGXAXLX GNLSANGDTR YTVSHNATQN GNLSLVGNAQ 

801 ATFNQATLNG NXSXSGNASF NLSNNAAQNG SLTLSDNAKA NVSHSALNGN 

851 VSLADKAVFH FENSRFTGQL SGSKXTALHL KDSEWTLPSG TELGNLNLDN 

901 ATITLNSAYR HDAAGAQTGX VSDTPRRRSR RSLLSVTPPT SVESRFNTLT 

951 VNGKLNXQGT FRFMSELFGY RSDKLKLAES SEGTYTLAVN NTGNEPVSLD 

1001 QLTWEGKDN KPLSENLNFT LQNEHVDAGA WRYQLIRKDG EFRLHNPVKE 

1051 QELSDKLGKA EAKKQAEKDN AQSLDALIAA GRDAAEKTES VAEPARXAGG 

1101 ENVGIMQAEE EKKRVQADKD SALAKQREAE TRPXTTAFPR ARXARRDLPQ 

1151 PQPQPQPQPQ PQRDLXSRYA NSGLSEFSAT LNSVFAVQDE LDRVFAEDRR 

1201 NAVWTSXIRX TKHYRSQDFR AYRQQTDLRQ IGMQKNLGSG RVGILFSHNR 

1251 TENXFDDGIG NSARLAHGAV FGQYGIGRFD IGISTGAGFS. SGXLSDGIGG 

1301 KIRRRVLHYG IQARYRAGFG GFGIEPYIGA TRYFVQKADY RYENVNIATP 

1351 GLAFNRYRAG IKADYSFKPA QHXSITPYXS LSYTDAASGK VRTRVNTAVL 

1401 AQDFGKTRSA EWGVNAEIKG FTLSXHAAAA KGPQLEAQHS AGIKLGYRW* 



A transmembrane region is underlined. 
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ORF1-1 (SEQ ID NO: 650) shows 86.3% identity over a 1462aa overlap with ORFla (SEQ ID 
NO: 652): 



10 20 30 40 50 60 

orf la . pep MKTTDKRTTETHRKAPKTGR I RFS P AYLAI CLS FGI LPQAWAGHTYFG INYQYYRDFAEN 

1 1 1 1 M II I III III II M 1 1 M I II 1 1 II Ml M II 1 1 II 1 1 II II 1 1 Ml I ill I 

orf 1-1 MKTTDKRTTETHRKAPKTGRI RFS PAYLAI CLS FGI LPQAWAGHTYFG INYQYYRDFAEN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf la . pep KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALVGDQY I VS VAHNGGYN 

I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I I I 
or f 1 - 1 KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I D FS WSRNGVAALVGDQY I VS VAHNGGYN 

70 80 90 100 110 120 



130 140 150 160 170 179 

orf la . pep NVDFGAEGXNPDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSDM 

15 I I I I I I I I I I I I I I h M I I I I I I : : I I h I I I I I I I I I I I I I I II I I I I I I 

orf 1-1 NVDFGAEGRNPDQHRFTYKI VKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTS YM 

130 140 150 160 170 180 

180 ' 190 200 210 220 230 

orf la . pep RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL- -SYSGA WL I GGNTHMQGWGNN 

20 | | |:::||:|||||:|::||| |:|: :: || | Mill |: ::: 

orf 1 - 1 DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

190 200 210 220 230 240 



240 250 260 270 280 290 

Orf la . pep GVXSLSGD-VRHANDYGPMPIAGAAGDSGS PMFIYDKTNNKWLLNGVLQTGYPYSGRENG 

25 |: :|::: ::|: | | : | : | : || | | | | | | | | | : : | | | : | | | | | | | | | | : | | 

orf 1-1 GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNG 

250 260 270 280 290 



300 310 320 330 340 350 

orf la . pep FQLIRKDWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 

I I I : I I ! I I M : I : MM MMMMM.-MMM :: :|: I I MMM 
orf 1-1 FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 
300 310 320 330 340 350 



360 370 380 390 400 410 

orf la. pep VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 

35 IMM Ml MM I II M 1 1 1 II II 1 1 M 1 1 MMMMMIMIMI 

orf 1-1 VQLFNVSLSETARE P VYHAAGGVNS YRPRLNNGEN I S F I DEGKGEL I LTSN INQGAGGLY 

360 370 380 390 400 410 

420 430 440 450 460 470 

orf la . pep FEGDFTVS PENNETWQGAGVH I S EDSTVTWKVNGVANDRLS KI GKGTLHVQAKGENQGS I 

40 I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

orf 1-1 FQGDFTVS PENNETWQGAGVH IS EDS TVTWKVNGVANDRLSKI GKGTLHVQAKGENQGS I 

420 430 440 450 460 470 



480 490 500 510 520 530 

orf la . pep SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

MMM illMIMM IMIIIMI III IIIIMIMIIIIIIIMIIMI II 

orf 1 - 1 SVGDGTVI LDQQADDKGKKQAFSE IGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

480 490 500 510 520 530 
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540 550 560 570 580 590 

orf la. pep HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

I I I I I I I I I I I I i I I I II II I I II I I- h :|:| I h Ml I I II II II 

orf 1-1 HS LS FHR I QNTDEGAM I VNHNQD KE S T VT I TGNKD I AT - TGNN - NS LDS KKE I A YNGW FG 

5 540 550 560 570 580 590 

600 610 620 630 640 650 

or f la . pep EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

MM MINIM MINIM I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 : : 

or f 1 - 1 EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 
10 600 610 620 630 640 650 

660 670 680 690 700 710 

orf la . pep WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 

Ih NNNNNINN 1 1 N 1 1 1 N N N 1 1 N 1 1 1 II N II llllllllllll 

orf 1-1 WSQKEGIPRGEIVWDNDWINRTFKAENFQIKGGQAVVSRNVAKVKGDWHLSNHAQAVFGV 
15 660 670 680 690 700 710 

720 730 740 750 760 . 770 

orf la . pep APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 

IIIIIIIIIIIIIIIIMIIII = I I I I I I I M I I I I || I I hi 
orf 1-1 APHQSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLN 
20 720 730 740 750 760 770 

780 790 800 810 820 830 

orf la . pep GNLS ANGDTRYTVSHNATQNGNLS LVGNAQATFNQATLNGNXSXSGNAS FNLSNNAAQNG 

I N I II N 1 1 N N N 1 1 1 II 1 1 1 1 1 1 II . I II 1 1 II I N IN MIMMIhMMM 

orf 1-1 GNLS ANGDTRYTVSHNATQNGNLS LVGNAQATFNQATLNGNTS ASGNAS FNLSDHAVQNG 

25 780 790 800 810 820 830 

840 850 860 870 880 890 

orf la . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 

Mill 1 1 III 1 1 III I Ml Ml 1 1 II II 1 1 hi II II hi hi II I MM II II III I 

orf 1-1 SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
30 840 850 860 870 880 890 

900 910 920 930 940 
orf la . pep TELGNLNLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN 

II I IMMIII I M 1 1 1! II I II I! MM -MMIIMM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 1-1 TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFN 
35 900 910 920 930 940 950 

950 960 970 980 990 1000 

orf la . pep TLTWGKIiNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTVVEG 

1 1 : 1 1 1 1 1 1 I N I II 1 1 N 1 1 1 1 1 1 1 II N 1 1 1 II 1 1 II 1 1 1 1 N 1 1 1 N N I I N II 

orf 1-1 TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEG 
40 960 970 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

orf la . pep KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 

I N 1 1 1 1 II I II 1 1 1 1 N I II 1 1 N 1 1 N I II N I N 1 1 1 1 1 1 1 N II I III I II I II 

orf 1 - 1 KDNKPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAE 
45 1020 1030 1040 1050 1060 1070 

1070 1080 1090 1100 1110 1120 

orf la . pep KDNAQSLDALI AAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 

I I I I I I I II I I I I I I I h I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
or f 1 - 1 KDNAQSLDALI AAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDTALAKQR 

50 1080 1090 1100 1110 1120 1130 
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1130 1140 1150 1160 1170 1180 

or f la . pep EAETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 

MM 1 1 MM MM MMMI MMMM MM M I II M I M 1 1 1 1 1 1 M 1 1 

orf 1- 1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP- -QRDLISRYANSGLSEFSATLNSVFAV 

5 1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orf la . pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 

MIMMMMMMMM M M M M M M M M M M M M M M M M M M M I 

orf 1-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
10 1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf la . pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 

MMMMMMMMMMMMMMI M MMMMMM MMMMMMM 

orf 1 - 1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
15 - 1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orf la .pep . HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 

i i 1 1 1 1 1 1 [ 1 1 i i ^ 1 1 1 9 1 1 1 1 i 1 1 ! r i [ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 1-1 HYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
20 1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orf la . pep KPAQHXSITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 

MM! MMI MMMMMMMMMMMMMMMMMMMIMMM M 

o r f 1 - 1 KPAQH ISITPYLS LS YTDAASGKVRTR VNTAVLAQD FGKTRS AEWGVNAE I KGFTLS LHA 

25 1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf la . pep AAAKGPQLEAQHSAGI KLGYRWX 

MMMMMMMMMMMI 

orf 1-1 AAAKGPQLEAQHSAGI KLGYRWX 

30 1440 1450 

Homology with adhesion and penetration protein hap precursor of H .influenzae (accession number 
P45387) (SEQIDNO: 1153) 

Amino acids 23-423 of ORF1 (SEQ ID NO: 648) show 59% aa identity with hap protein (SEQ ID 
NO: 1 153) in 450aa overlap: 



35 



40 



orf 1 


23 


hap 


6 


orf 1 


83 


hap 


66 


orf l 


143 


hap 


125 



F +L C+S GI QAWAGHTYFGI + YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
FRLNFLTACVS LG I ASQAWAGHTYFG IDYQYYRDFAENKGKFTVGAKN I E VYNKEGQLVG 6 5 



SMTKAPMIDFSWSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 



KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 
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orf 1- 203 QYWRSDEDEPNNRESSYHIA 222 

QYWR+D+DE N SSY+++ 
hap 185 QYWRTDKDEETNVHSSYYVSGAYRYLTAGNTHTQSGNGNGTVNLSGNWS PNHYGPLPTG 244 

orfl 223 SGS PMF I YDAQKQKWL INGVLQTGNP Y I GKSNGFQLVRKDWFYDE I FAGDTHS VF 277 

5 SGSPMFIYDA+K++WLIN VLQTG+P+ G+ NGFQL+R++WFY+E+ A DT SVF 

hap 245 GSKGDSGSPMFIYDAKKKQWLINAVLQTGHPFFGRGNGFQLIREEWFYNEVLAVDTPSVF 304 

orfl 278 - - YEPRQNGKYSFNDDNNGTGKIN-AKHEHNSLPNRLKTRTVQLFNVSLSETAREPVYHA 334 

Y P NG YSF +N+GTGK+ + + + + TV+LFN SL++TA+E V A 
hap 305 QRYIPPINGHYSFVSNNDGTGKLTLTRPSKDGSKAKSEVGTVKLFNPSLNQTAKEHV-KA 363 

10 orfl 335 AGGVNS YRPRLNNGEN I S F IDEGKGEL I LTSN INQGAGGLYFQGDFTV - S PENNETWQGA 393 

A G N Y+PR+ G+NI D+GKG L + +N I NQGAGGL Y F + G + F V +NN TWQGA 
hap 364 AAGYN I YQPRME YGKN I YLGDQGKGTLT I ENN INQGAGGLYFEGNFWKGKQNN I TWQGA 423 

orfl 394 GVHISEDSTVTWKVNGVANDRLSKIGKGTL 423 
GV I +D+TV WKV+ NDRLSKIG GTL 
15 hap 424 GVS I GQDATVEWKVHNPENDRLS KI GI GTL 453 

Amino acids 715-101 1 of ORF1 (SEQ ID NO: 648) show 50% aa identity with hap protein (SEQ 
ID NO: 1 1 53) in 258aa overlap: 

20 Orfl 41 DTRYTVSHNATQ - NGNXSLVXNAQATFNQ - ATLNGNTS ASGNAS FNLSDHAVQNGS LTLS 98 

DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 
hap 733 DTKVINSIPITQINGSINLTNNATVNIHGLAKLNGNVTLIDHSQFTLSNNATQTGNIKLS 792 

orfl 99 GNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 
+A A V+ + + LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 
25 hap 793 NHANATVNNATLNGNVHLTDSAQFSLKNSHFWHQIQGDKDTTVTLENATWTMPSDTTLQN 852 

orfl 159 LNLDNATITLNSAYRHDAAGAQTGSATDAPXXXXXXXXXXLLXVTPPTSVESRFNTLTVN 218 

L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

hap 853 LTLNNSTVTLNSAY S AS SNNAPRHRRS LETETTPTSAEHRFNTLTVN 899 

orfl 219 GKiNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDNKP 278 
30 . GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 

hap 900 GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEQLTLIESLDNKP 959 

orfl 279 LSENLNFTLQNEHVDAGA 296 

LS+ L FTL+N+HVDAGA 
hap 960 LSDKLKFTLENDHVDAGA 977 

35 

Amino acids 1 192-1450 of ORF1 (SEQ ID NO: 648) show 41% aa identity with hap protein (SEQ 
ID NO: 1 153) in 259aa overlap: 

Orfl 1 LDRVFAEDRRNAVWTSG I RDTKHYRSQDFRAYRQQTDLRQ I GMQKNLGSGRVG I LFSHNR 60 
LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 
40 hap 1135 LDRLFVDQAQS AVWTN I AQDKRRYDSDAFRAYQQKTNLRQ I GVQKALANGR I GAVFSHSR 1194 

orfl 61 TENTFDDG I GNS ARLAHGAVFGQYG I DRF YXXXXXXXXXXXXXXXXX I GXKXRRRVLHYG 12 0 

. ++NTFD+ +NAL+FQY KR+ ++YG 

hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKAINYG 1254 
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5 



orf 1 


121 


IQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNI ATPGLAFNRYRAGI KADYS FKPA 


180 






+ A Y+ G GI+P+ (j Kir+ + + +1+ b V + IF LiAriNKi A(j± + L)l+r F 




hap 


1255 


VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 


1314 


orf 1 


181 


QH I S I TP YLSLS YTDAASGKVRTRVNTAVLAQDFGKTRS AEWGVNAE I KGFTLSLHAAAA 


240 






+ IS + PY ++Y D ++ V+T VN VLQ FG+ E G+ AEI F +S + + 




hap 


1315 


DNI SVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEVGLKAE I LHFQI SAFI SKS 


1374 


orf 1 


241 


KGPQLEAQHSAGI KLGYRW 25 9 








+G QL Q + G+ KLGYRW 




hap 


1375 


QGSQLGKQQNVGVKLGYRW 1393 





10 Homology with a predicted ORF from N. gonorrhoeae 

The blocks of ORF1 (SEQ ID NO: 648) show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 
259 aa overlap, respectively with a predicted ORF (ORFlng) (SEQ ID NO: 654) from 
N. gonorrhoeae: 



orf 1 . pep MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 60 

15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 M M Mill llllll MM I II II III Mill MM 1 1 

orf lng MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 60 

orf 1 . pep KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALVGVQY I VS VAHNGGYN 120 

M I i 1 1 1 1 1 M M 1 1 1 M 1 1 1 ! I II 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 hi Ml MINIM 

orf lng KGKFAVGAKDI EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALAGDQY I VS VAHNGGYN 120 

20 orf 1 .pep NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 180 

III IM I II I M M II 1 1 M 1 1 1 1 M I M 1 1 1 II I M 1 1 Ml llllll 

orf lng NVDFGAEGSN - PDQHRFSYQI VKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTS Y 17 9 

orf 1 .pep MDGRKY I DQNNYPDRVRI GAGRQ YWRSDEDE PNNRESS YH IAS 223 

III M I h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 M 1 1 1 

25 or f lng MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESS YHIASAYSWLVGGNTFAQNGSG 23 9 

orf 1. pep GS PM F I YDA QKQ KWL I N GVLOTGNP Y I GKSNG 255 

II I I I I I II I I I I II I I M I M M I I I I M I 

orflng GGTVNLGSEKIKHSPY GFLPTGGS FGDS GS PM F I YD A QKQKWL I N GVLOTGNP Y I GKSNG 289 



orf 1 . pep FQL VRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 

30 || Ml || MIMI MM Mill MM I II MIMIMIIIMIMI Ml llllll 

orflng FOLV RKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRT 359 

orf 1. pep VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 375 

I I I I I I I I II I I I I I I I I I II I I I I M I I II I I I I I II II • I I I I I I I I II II II II II I 
orflng VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

35 orf 1 .pep FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGT 422 

M >M 1 1 IM II I M 1 1 M 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 

orflng FEGNFTVS PKNNETWQGAGVHI SDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 4 79 

// 

orf 1 . pep DKVTASLTKTDISGNVDLADHAHLNLTGLA 744 

40 1 1 1 1 M M M 1 1 M I I II M 1 1 1 1 M 

orflng FGVAPHQSHT I CTRSDWTGLTS CTEKT I TDDKV I AS LS KTDVRGNVSLADHAHLNLTGLA 774 
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orf 1 . pep TLNGNLSANGDTR- YTVSHNATQNGNXSLVXNAQATFNQATLNGNTS ASGNAS FNLSDHA 803 

hllll ::::|| : lllllll III I I I I I I I I I I I I II I I II hllhhh 
O r f 1 ng T FNGNL - VQAETRT I RLRANATQNGNLS LVGNAQATFNQATLNGNTS AS DNAS FNLSNNA 833 

orf 1 .pep VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 863 

5 Illllllll I I I I I I I M M I I I I I I I M II I I :| I I I hi M I I I I I I I I I I I I I I 

orf lng VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 893 

orf 1 . pep LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 923 

hhllhlhlhhh hlllhlllllllhllllllll I II MMIhl 

orf lng LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

10 orf 1 .pep SRFNTLTWGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 

I I I I I I I I I I I I I I I I I I I ■ I I I I I I I I I I I II I I I I I I I I I I I II I hi I I I I I 
orf lng SRFNTLTWGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

orf 1 . pep WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 

lllllll llllllllllllllllllll 
15 orf lng WEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

// 

orf 1. pep LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 

llllllllllllllllllllllllllllll 
orf lng PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 123 9 

20 orf 1 .pep AYRQQTDLRQ I GMQKNLGSGRVG I LFSHNRTENTFDDG I GNS ARLAHGAVFGQ YG I DRFY 1271 

I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orflng AYRQQTDLRQ I GMQKNLGSGRVG I LFSHNRTGNTFDDG I GNS ARLAHGAVFGQ YG I GRFD 1299 

orf 1 .pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

I I I i ill I h I II II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
25 orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1 .pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 13 91 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
orf lng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHIS ITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orf 1 .pep AQDFGKTRSAEWGVNAE I KGFTLSLHAAAAKGPQLEAQHSAGI KLGYRW 144 0 

30 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orflng AQDFGKTRSAEWGVNAE I KGFTLSLHAAAAKGPQLEAQHSAGI KLGYRW 1468 

The complete length ORFlng nucleotide sequence was identified (SEQ ID NO: 653): 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 

35 51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

40 301 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 

4 01 ACCGCTTTTC TTACCAAATT GTGAAAAGAA -ATAATTATAA AGCAGGGACT 

4 51 AACGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCACAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 

45 551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 

601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 
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801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACATCAAAA TGGGAAATAC TTTTTTAACG 

5 1001 * ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 CTACCTTATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

1201 GGAAAAGGTG AATTGATACT TACCAGCAAC ATCAACCAAG GCGCGGGCGG 

10 1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 

14 01 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

15 1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

1701 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 

20 1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AATTACCAAC CGGAAGAAGC 

1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

25 2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

30 2251 TTGAGCAAGA CCGACATCAG AGGCAATGTC AGCCTTGCCG ATCACGCTCA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

23 51 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 
2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

24 51 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 
35 2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2 551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 

2701 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

40 2751 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 

2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

2851 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 

2 901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 

2 951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 
45 3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 

3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 

3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 

3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 

3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 

50 3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 

3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 

33 51 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

3401 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 

■3451 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 

55 3 501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 

3 551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 
3601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 
3 651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 
3701 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 

60 3751 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 

3 801 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 
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3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 

3 901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 

3 951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 
4 051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 
4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 
4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 
4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 
4251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 
4301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 
4351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 
4401 CTGGTAA 

This is predicted to encode a protein having amino acid sequence (SEQ ID NO: 654): 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GGSFGDSGSP MFIYDAQ KQK WLIN GVLOTG NPYIGKSNGF 

3 01 OLVRKDWFYD E I FAGDTHS V FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSLSET AREPVYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVSPKN NETWQGAGVH ISDGSTVTWK 

451 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT - 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

751 LSKTDVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 

801 SLVGNAQATF NQATLNGNTS ASDNAS FNLS NNAVQNGSLT LSDNAKANVS 

851 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 

901 GNLNLDNAT I TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 

951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT WEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE SVAEPARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

1401 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 (SEQ ID NO: 650) and ORFlng (SEQ ID NO: 654) show 93.7% identity in 1471 aa 
overlap: 



10 20 30 40 50 60 

orf 1-1. pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 

Mill 1 1 M M 1 1 M 1 1 II II I M M 1 1 M ! 1 1 I II I II 1 1 1 1 1 II 1 1 1 1 1 1 
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orf lng-1 MKTTDKRTTETHRKAPKTGR I RFS PAYLAI CLS FG I LPQARAGHTYFG I NYQYYRDFAEN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1- 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

5 1 1 1 1 1 1 1 M 1 1 1 i II M 1 1 1 M h M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I hi i 1 1 1 1 I M 1 1 1 1 1 

orf lng- 1 KGKFAVGAKD I EVYNKKGELVGKSMTKAPM I DFS WSRNGVAALAGDQY I VS VAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 1 - 1 . pep NVDFGAEGRNPDQHRFTYKI VKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

10 Mill I II Mill Ihhlllllll III hllllllll MINI MM MINIUM I 

orf lng- 1 NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1-1. pep DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

15 II || | MIMIIM MM MIMMMMMMM MM IIMMIIIIMI 

orf lng- 1 DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 1- 1 . pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
20 || | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 

orf lng-1 GTVNLGSEKIKHSPYGFLPTGGSFGDSGS PMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 1- 1 . pep QLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

25 1 1 ii iiiiiiiu ii i ii ii i ii hiii ii 1 1 hi i hi i hi i hi mi i ii ii 1 1 

orf lng-1 QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 1-1. pep QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLYF 
30 | | | | | | | | | | | | || | | | | | | | | | | | | | | | | || | || | | | |: | | | | || | || | | | | | | || | | | 

orf lng-1 QLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLYF 

370 380 390 400 410 420 

430 440 450 460 " 470 480 

orf 1 - 1 . pep QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGS IS 

35 : |: || | | |: | | | | | | | | | | | | |: | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |: I 

orf lng-1 EGNFTVS PKNNETWQGAGVH I SDGSTVTWKVNGVANDRLS KI GKGTLLVQAKGENQGS VS 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 1- 1 . pep VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

40 II 1 1 1 1 1 1 1 1 1 1 h I I 1 1 I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 I I 1 1 1 II 1 1 1 1 1 

orf lng-1 VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

490 500 510 520 530 540 

550 560 570 580 590 600 

orf 1- 1 . pep SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 

45 MMMMMMMMMMMMMIMM MM IIMMMMMMIII Mill 

orf lng-1 S LS FHR I QNTDEGAM I VNHNQDKESTVT I TGNKD I TTTGNNNNLDS KKE I AYNGWFGEKD 

550 560 570 . 580 590 600 



orf 1-1 .pep 



610 620 630 640 650 660 

TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
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MINIMI Ml I M M M M M 1 1 1 M 1 1 i 1 1 1 1 1 M M 1 1 1 1 1 1 1 - lh 

ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 
610 620 630 640 650 660 

670 680 690 700 710 720 

KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 

1 1 1 M II 1 1 M I II M I II I M IMM II M M 1 1 1 1 1 N I II I II I II M II M 1 1 

MEG I PQGE I VWDNDW I DRT FKAENFH I QGGQAWS RNVAKVEGDWHLSNHAQAVFGVAPH 
670 680 690 700 710 720 

730 740 750 760 770 780 

QSHT I CTRSDWTGLTNCVEKT I TDDKVI ASLTKTD I SGNVDLADHAHLNLTGLATLNGNL 

I II II I I II I I II Ml M I IM I I I I I II I M M I I I I : I I I I I I I I I I I I I I I I I I I 
QS HT I CTRSDWTGLTS CTE KT I TDDKV IAS LS KTD I RGNVS LADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

SANGDTRYTVSHNATQNGNLS LVGNAQATFNQATLNGNTS ASGNAS FNLSDHAVQNGSLT 

I M MM I MM I I I M I I M I I I I I I I I I I I M I I II I I IMIINMM III 
SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNGSLT 
790 800 810 820 830 840 

850 860 870 . 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

II IMMMMMI MMMIMMIMM IMIIIIIIIIIIIIIIIIII MM 

LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 
850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 

1 1 1 1 II 1 1 ! I 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 MINI II III :| III II II 

GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

VNGKLNGQGTFRFMSELFGYRSDKiKLAESSEGTYTLAVNNTGNEPASLEQLTVVEGKDN 

MIIMMMIMM Mill 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M 1 1 Ml I ! 1 1 M I II 1 1 

WGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTVVEGKDN 
960 970 980 990, 1000 1010 

1030 1040 1050 1060 1070 
KPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKA 

llllll Mill II I MM II II I II III III I I II II I I III I II II I I 
TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 

1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 
EAKKQAEKDNAQSLDAL I AAGRDAVEKTES VAE PARQAGGENVG IMQAEEEKKRVQ 

I IM 1 1 1 1 1 1 1 M M M II MM M I M II 1 1 1 M I II M I M II M I II I II I 

QAQLAAKQQAEKDNAQSLDAL I AAGRNATEKAES VAE PARQAGGENAG IMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 

II I I II M II M I I I II I I II I I I II II I I I I I I IMIMI MMMMMIIMM 
ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 
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or f 1 - 1 . pep ATLNS VFAVQDELDRVFAEDRRNAVWTSGI RDTKHYRSQDFRAYRQQTDLRQ I GMQKNLG 

1 1 M 1 1 1 II 1 1 1 1 1 II I M 1 1 1 : II 1 1 1 1 1 1 ' 1 1 1 1 1 Ml M 1 1 1 1 1 II 1 1 II 1 1 1 1 

orf lng- 1 ATLNS VFAVQDELDRVFAEDRRNAVWTSG I RDTKHYRSQDFRAYRQQTDLRQ I GMQKNLG 

1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orf 1-1 .pep SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 

lllllllllllll 1 1 1 1 1 1 1 M 1 1 Ml 1 1 ; 1 1 1 II I 1 1 1 1 M I M 1 1 M M I M 

orf lng- 1 SGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 



10 



15 



20 



1310 1320 1330 1340 1350 1360 

orf 1-1 .pep GGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 

M 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 M I M 1 1 M 1 1 1 ! 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 

orf lng- 1 RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

or f 1 - 1 . pep AG I KAD YS FKPAQH I S I TP YLS LS YTDAASGKVRTRVNTAVLAQD FGKTRS AEWGVNAE I 

Mill MMIIi! IMIIIII IMIIIII iMMMMMMMMIMIIilM 

orf lng- 1 AG I KAD YS FKPAQH I S I TPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRS AEWGVNAE I 

1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orf 1 - 1 . pep KGFTLSLHAAAAKGPQLEAQHSAGI KLGYRWX 

I II III I II MM III I II MM III MM II 

orf lng- 1 KGFTLSLHAAAAKGPQLEAQHSAGI KLGYRWX 

1440 1450 1460 



25 In addition, ORFlng (SEQ ID NO: 654) shows 55.7% identity with hap protein (P45387) (SEQ ID 
NO: 1 153) over a 1455aa overlap: 

SCORES Initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 



30 



10 20 30 40 50 60 

orf lng- 1 . pep MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 

I M: :MMM II I I I I I I I I : I I I i I I I i I I 
p453 87 M KKT VFRLN FLT AC I S LG I VSQ AW AGHT Y FG I D YQ Y YRDF AEN 

10 20 30 40 



35 



70 80 90 100 110 120 

orf lng- 1 . pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 

MIMM-M MMMIM 1 1 II I II 1 1 M I M M I II I h MIMIIMI IM 

p453 87 KGKFTVGAQNI KVYNKQGQLVGTSMTKAPM I DFS WSRNGVAALVENQ Y I VSVAHNVGYT 

50 60 70 80 90 100 



40 



130 140 150 160 170 180 

orf lng- 1 . pep NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

MMMIMMI MMIMI MM I III ill 11111111 = 1 MMM I 

p4 5 3 8 7 DVDFGAEGNNPDQHRFTYKI VKRNNYKKD - NLHP YEDDYHNPRLHKFVTEAAP IDMTSNM 

110 120 130 140 150 160 



45 



190 200 210 220 230 240 

orf lng- 1 . pep DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

= I hi : I I I : I I I I I = I I I = I I : I : I : = -MM M-IM I hh 

p45387 NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 
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170 180 190 200 210 

250 260 270 280 290 300 

orf lng- 1 . pep GTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

I II:: I : II II HI I II I I I I I I I I I = I I I I I I I h I = llh II Ml 
5 p453 8 7 GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 

220 230 240 250 260 270 

310 320 330 340 350 360 

orf lng- 1. pep QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
lllh: | | | | | |: :| || | :: |:|| |:| | ::| ::| : 

10 p4 5 3 8 7 QLVRKSYF- DE I FERDLHTSLYTRAGNGVYT I SGNDNGQGS I TQKS GIPSEIK I 

280 290 300 310 320 

370 380 390 400 410 419 

orf lng- 1. pep QLFNVS LSETARE P VYHAA - GGVNS YRPRLNNGEN I S F IDKGKGEL I LTSNINQGAGGLY 

I 1 = 11 :: h: III 111111- hh I I I - h I I I I I I I II 

15 p45387 TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 

330 340 350 360 370 380 

420 430 440 450 460 470 479 

orf lng- 1 . pep FEGNFTVS PKNNETWQGAGVH I SDGSTVTWKVNGVANDRLS KI GKGTLLVQAKGENQGS V 

INI ll|::|:| MI|:|:|::|IMllhl : I I I I I I I I I I I I I I I I I I : I h 
20 p4 5 3 8 7 FEGNFTVS PNSNQTWQGAG IHVSENS TVTWKVNGVEHDRLS KI GKGTLHVQAKGENKGS I 

390 400 410 420 430 440 

480 490 500 510 520 530 539 

orf lng- 1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

MMMIMIirihlMM llllllll MM hlh 1 1 = II II 1 1 1 1 1 1 1 II 

25 p4 53 87 S VGDGKVI LEQQADDQGNKQAFS E I GLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 

450 460 470 480 490 500 

540 550 560 570 580 590 

orf lng- 1. pep HS LS FHRI QNTDEGAM I VNHNQDKESTVT I TGNKD I TT - TGNN - NNLDS KKE I AYNGWFG 

I I I : h I I I I I I I I ' I I I I I : ::||||||::|: : I I I I : I I = I I I I I I I I I I 
30 p4 53 87 HSLTFKRIQNTDEGAMIVNHNTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 

510 520 530 540 550 560 

600 610 620 630 640 650 

orf lng- 1 . pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I MINI hi I I II II I I I I I M: I I IM I I I I I I I I I I I I I I I I - 

35 p4 53 87 ETDKNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 

570 580 590 600 610 620 

660 670 680 690 700 710 

orf lng- 1 . pep WSKMEGIPQGEIVWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGV 

I: I I II I M I I I I : I I h I M II I I I : I : M : I I II I I I-: I I M HhhhMI 
40 p4 53 87 WS EMEG I PQGE I VWDHD W I NRT F KAENFQ I KGGS A WS RNVS S I EGNWT VSNNANAT FGV 

630 640 650 660 670 680 

720 730 740 750 760 770 

orf lng- 1 . pep APHQSHT I CTRSDWTGLTS CTEKT I TDDKVI ASLS KTD I RGNVSLADHAHLNLTGLATLN 
:| :|::|M I I I I I I I I h : : | | | | | | : | | : | | -: : : | : | : | h Ml II 
45 p4 5 3 8 7 VPNQQNT I CTRSDWTGLTTCQKVDLTDTKVINS I PKTQ INGS I NLTDN AT ANVKGLAKLN 

690 700 710 720 730 ' 740 

780 790 800 810 820 830 



orf lng- 1 .pep 
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p45387 



GNVTL- 

750 



- TNHSQFTLSNNATQIG 
760 770 



840 850 860 870 880 890 

orf lng- 1 . pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

:: ||||: | = | = :: Mill hhl I = MIM = MM I I- | = = : MM 
P4 53 87 NIRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 

780 790 800 810 820 830 



10 



900 910 920 930 940 950 

orf lng- 1 . pep TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 

I I IIMMMMIMM = M = -MM I I : I III IMIM 

p4 5 3 8 7 TTLQNLTLNNST I TLNSAY SASSNNTPRRRS LETETTPTS AEHRFNTLT 

840 850 860 870 



15 



960 970 980 990 1000 1010 

orf lng- 1 . pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTVVEGKDN 

IIIIIMIIIIM I IMIM IMh-MI I IM IMMI MMMMIMII 

p453 8 7 VNGKLSGQGTFQFTS S LFGYKSDKLKLSNDAEGDY I LS VRNTGKEPETLEQLTLVES KDN 

880 890 900 910 920 930 



20 



1020 1030 1040 1050 1060 1070 

orf lng- 1 . pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 

llhMMIIMMIIIII ll = l = = MIIIIIIIIMIIII : I M - M M 

p4 5 3 8 7 QPLSDKLKFTLENDHVDAGALRYKLVKNDGEFRLHNP I KEQELHNDLVRAEQAERTLEAK 

940 950 960 970 980 990 



25 



1080 1090 1100 1110 1120 1130 

orf lng- 1 . pep Q AQLAAKQQAEKDNAQS LD AL I AAGRNAT - E KAE S VAE P ARQAGGENAG I MQ AE EE KKR V 

h: Ml h = :::| I II - = = = I IMI M = = = = : IM 
p4 5 3 8 7 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE - LTAETQKS KAKTKKV 

1000 1010 1020 1030 1040 1050 



30 



1140 1150 1160 1170 1180 1190 

orf lng- 1 . pep QADK- - - DTALAKQREAETRPATTAFPRARRARRD - LPQPQPQPQPQPQRDLI SRYANSG 

:: : =11 = I - :::: = | I I : = I : IMIMIIMh 

p4 53 87 RSKRAVFSDPLLDQSLFALEAALEVIDAPQQSEKDRLAQEEAEKQ-RKQKDLISRYSNSA 
1060 1070 1080 1090 1100 1110 



35 



1200 1210 1220 1230 1240 1250 

orf lng- 1 .pep LS E FS ATLNS VFAVQDELDRVFAEDRRNAVWTSG I RDTKHYRSQDFRAYRQQ - TDLRQ I G 

IMMMM|::MMMMM::: = = 1111= =1 = M h 1111 = 11 1 = 11111 
p4 5387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 



40 



1260 1270 1280 1290 1300 1310 

orf lng- 1. pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 

Ml hMIM =111 = 1= 1111= = 111= = 1 = 11 I :::| = : = |s|:|:: 
p4 53 8 7 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 



45 



1320 1330 1340 1350 1360 1370 

orf lng- 1 . pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 

• :::= ||:| = = = = lhM h =1 = I I = I = = I = = I I I = = = =1= 1 = 1 = 11 = 1 

p453 87 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 

1240 1250 1260 1270 1280 1290 



orf lng- 1 .pep 



1380 1390 1400 1410 1420 1430 

AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 
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p4 5387 AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 

1440 1450 1460 1469 

5 orf lng- 1 . pep GVNAE I KGFTLSLHAAAAKGPQLEAQHSAG I KLGYRWX 

I :| : ::| II h-hllllll 
p4 5 3 8 7 GLKAE I LHFQ I SAF I SKSQGSQLGKQQNVGVKLGYRW 

1360 1370 1380 1390 

10 Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 78 



The following partial DNA sequence was identified in N. meningitidis (SEQ ED NO: 655): 

1 . .AAGGTGTGGC AATTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

15 51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG . 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

20 301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

4 01 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence (SEQ ED NO: 656; ORF6): 



1 ..KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNWKGLQE 

25 51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 

101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence (SEQ ID NO: 657): 

1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

30 101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

3 01 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGTGAAAA 

35 351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence (SEQ ED NO: 658; ORF6-1): 



1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNWKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 
40 101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.meningitidis (strain A) 
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ORF6 (SEQ ID NO: 656) shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) (SEQ 
ID NO: 660) from strain A of N. meningitidis: 



10 20 30 

KVWQFVEXPLRAWPADSFEPTAQKLNLFK 

lllllll I II 1 1 1 M 1 1 i 1 1 1 1 1 M! I 

QIVEHAVLHTPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFK 
40 50 60 70 80 90 

40 50 60 70 80 90 

AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

1 1 1 1 1 1 M M 1 1 1 1 ; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 i I 

AGAAT I LFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 
100 110 120 130 140 150 

100 110 120 130 140 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

M 1 1 1 1 1 1 1 1 1 M II ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I II 1 1 1 1 ! 1 1 1 1 1 1 M 

NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence (SEQ ID NO: 659) is: 



orf 6 .pep 

5 

orf 6a 



orf 6 .pep 

10 

orf 6a 



orf 6 .pep 

15 

orf 6a 



20 1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

25 251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

3 01 GAAGATCAAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 
351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

4 01 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 
4 51 TACAATCCCT TGCCCGATGC -GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

30 501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC • 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence (SEQ ID NO: 660): 

35 1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 

51 RWVLFGEEH DKVWQFVEDA LRAWPADSF EPTAQKLNLF KAGAATILFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAN VQYAVWTTLA AVGVGANLQH 

151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 

201 A* 

40 ORF6a (SEQ ID NO: 660) and ORF6-1 (SEQ ID NO: 658) show 100.0% identity in 131 aa 
overlap: 



50 60 70 80 90 100 

orf 6a . pep TPSSFNSQSARVWLFGEEHDKVWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 III 1 1 1 1 1 II i 1 1 

45 orf 6-1 LRAWPADSFEPTAQKLNLFKAGAATILFY 

10 20 30 
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110 120 130 140 150 160 

or f 6a . pep EDQmA^KGLQEQFPAYAANFPWADQANA^QYAVWTTLAAVGVGANLQHYNPLPDAAIA 

1 1 II 1 1 1 1 1 1 1 1 1 1 M I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M II 1 1 1 1 1 1 1 1 M 1 1 1 1 I 

or f 6 - 1 EDQ^^WKGLQEQFPAYAANFPWADQANA^WQYAVWTTLAAVGVGANIJQHYNPLPDAAIA 
5 40 50 60 70 80 90 

170 180 190 200 

orf 6a . pep KAWNI PENWLLRAQMVIGGI EGAAGEKTFEPVAERLKVFGAX 

1 1 M M 1 1 1 1 M 1 1 1 1 II 1 1 M U 1 1 1 1 1 II 1 1 1 1 1 1 1 

orf 6 - 1 KAWNI PENWLLRAQMVIGGI EGAAGEKTFEPVAERLKVFGAX 

10 100 110 120 130 

Homology with a predicted ORF from N. gonorrhoeae 

ORF6 (SEQ ID NO: 656) shows 95.7% identity over a 140aa overlap with a predicted ORF 
(ORF6ng) (SEQ ID NO: 662) from N. gonorrhoeae: 

orf 6 .pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 3 0 

15 I I I > 1 I I lllllllill IIIIIMI 

orf6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 

orf 6 .pep AGAAT I LFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 90 

1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 M ! 1 1 II 1 1 1 1 1 1 1 1 1 1 ' 1 1 II M h I hi 1 1 1 1 1 1 

orf 6ng AGAATILFYEDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 124 

20 orf 6 .pep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGA 140 

I I I I : I I I I I I I I I I ■ I I I I I I I I I I I II I i I hi I I I ■ I M I I I i 
orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence (SEQ ID NO: 661) was identified as: 



1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

25 51 ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

101 ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

2 01 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 
251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

30 301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

3 51 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

4 01 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 
451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 
501 acgtttgAAA GTGTTCGGCG CATAA 



35 



This encodes a protein having amino acid sequence (SEQ ID NO: 662): 



1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV EDALRAWPA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

40 151 GGIEGAAGEK VFEPVAERLK VFGA* 



ORF6ng (SEQ ID NO: 662) and ORF6-1 (SEQ ID NO: 658) show 96.9% identity in 131 aa 
overlap: 
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10 20 30 

orf 6-1 .pep LRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I I I I M I i I M I I M II I I M I I I I I 
orf 6ng PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 
20 30 40 50 60 70 



40 50 60 70 80 90 

or f 6 - 1 . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 

M 1 1 M 1 1 1 1 1 1 1 1 ; 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 II II II ! 1 1 1 M 1 1 1 1 II 1 1 1 1 M 1 1 

orf 6ng EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

100 110 120 130 

orf 6-1 .pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 i I M 1 1 1 1 II 1 1 M 1 1 1 

o r f 6 ng KAWN I PENWLLRAQMV I GG I EGAAGE KVFE P VAERLKVFGAX 

140 150 160 170 

It is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 79 

The following partial DNA sequence was identified in N meningitidis (SEQ ID NO: 663) 



1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGG^ CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

2 01 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 

251 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 

351 GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 

401 ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 

451 AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 

501 ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 

551 AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 

601 CCAAGACTGG AAACTCAAAG CCGAATACGA CTAC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 664; ORF23): 



1 . . GYNYLFARGS RIANYQINGI 

51 DGTGEPSATV NLVRKRLTRK 

101 LRGRLVSTFG RGDSWRRRER 

151 ETADAPLSYA VYDSQGYATA 

201 QDWKLKAEYD Y. . ■ 



PVADALADTG NANTAAYERV EWRGVAGLL 
PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 
SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 
FGPKDNPATN WANSHHRALN LFAGIEHRFN 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 665): 



1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 
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201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

3 51 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

401 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

75*1 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

13 01 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

13 51 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 
14 51 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 
1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 
1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 
1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 
1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 
1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 
1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 
1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 
1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 
1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 
1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 
2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 
2 051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 
2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 
2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 666; ORF23-1): 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

201 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 

2 51 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANS RHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYTRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with the ferric-pseudobactin receptor PupB of Pseudomonas putida (accession number 
P38047) (SEP ID NO: 1 154) 

ORF23 (SEQ ID NO: 664) and PupB protein (SEQ ID NO: 1 154) show 32% aa identity in 205aa 
overlap: 



Orf23 


6 


FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 


65 






++RG I NY+++G+P + L D + + A ++RVE+VRG GL+ G G PSAT+NL+RK 




PupB 


215 


WSRGFAIQNYEVDGVPTSTRL-DNYSQSMAMFDRVEIVRGATGLISGMGNPSATINLIRK 


273 


Orf23 


66 


RLTRKPL FE VRAEAGNRKH FGLDAD VS GS LNTEXXLRGRL VS T FXXXXXXXXXXXXXXAE 


125 






R T + + EAGN +G DVSG L +RGR V+ + 




PupB 


274 


RPTAEAQASITGEAGNWDRYGTGFDVSGPLTETGNIRGRFVADYKTEKAWIDRYNQQSQL 


333 


Orf23 


126 


LYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYD- - SQGYATAFGPKDNPATNWAN 


183 






+YGI E+D++ T + Y + D+PL + S G T N A +W+ 




PupB 


334 


MYGITEFDLSEDTLLTVGFSY- -LRSDIDSPLRSGLPTRFSTGERTNLKRSLNAAPDWSY 


391 


Orf23 


184 


SHHRALNLFAG I EHRFNQDWKLKAE 208 








+ H +FIE+ WKE 




PupB 


392 


NDHEQTS FFTS I EQQLGNGWSGKI E 4 16 





Homology with a predicted ORF from N. meningitidis (strain A) 



ORF23 (SEQ ID NO: 664) shows 95.7% identity over a 211aa overlap with an ORF (ORF23a) 
(SEQ ID NO: 668) from strain A of N. meningitidis: 



20 10 20 30 

orf 23 .pep 

orf 23a 



25 

orf23 .pep 
orf 23a 



30 

orf 23 .pep 
orf 23a 



35 

orf 23 .pep 
orf 23a 

270 280 290 300 310 320 



GYNYLFARGSRI ANYQINGI PVADALADTG 

I I I M I I II I I I M I I I I M I I I I I I II 
QMRDQN I KALDRALLQATGTSRQ I YGSDRAGYNYLFARGSRIANYQING I PVADALADTG 
90 100 110 120 130 140 

40 50 60 70 80 90 

NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTRKPLFE VRAEAGNRKH FGLDAD 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINIMI I lllllll II 

NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 

Illllhl MMIMIMIMI IMIIII lllllllllllllllllll lllllll 
VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 

210 220 '230 240 250 260 

160 170 180 190 200 210 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 

M M II 1 1 1 II I II II 1 1 1 1 II I M II 1 1 1 1 1 1 1 M 1 1 1 II 1 1 II I 1 1 II 1 1 1 1 1 M 1 1 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 



40 
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orf23.pep Y 

I 

orf23a YTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTHSASVSLIGKYRLFGREHDLIA 
330 340 350 360 370 380 

The complete length ORF23a nucleotide sequence (SEQ ID NO: 667) is: 



1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

2 01 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

2 51 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

3 01 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 
351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 
4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 
501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 
551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 
601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 
651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 
701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 
751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 
801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 
901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 
951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

13 01 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
1351 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 
14 51 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 
1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 
1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 
1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 
1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 
1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 
1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 
1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 
1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 
1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 
1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 
2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 
2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 
2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 
2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 668): 



1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKA LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL LDGTGEPSAT VNLVRKRPTR KPLFEVRAEA GNRKHFGLGA. 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQRE RSRDAELYGI LEYDIAPQTR 
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VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANS RHRAL 
NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 
SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERS IIP 
RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 
TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 
EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 
GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 
SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 
ARAADNSRQK A Y A VAD I MAR YRFNPRAELS LNVDNLFNKH 

10 

ORF23a (SEQ ID NO: 668) and ORF23-1 (SEQ ID NO: 666) show 99.2% identity in 725 aa 
overlap: 



251 


VHAGMDYQQA 


301 


NLFAGIEHRF 


351 


GYWHADPRTH 


401 


NAIPNAYEFS 


451 


ILGGRYSRYR 


501 


SLFVPQSQKD 


551 


LATAAGRDPS 


601 


DQDGSRLNPD 


651 


TLRIPNPAAK 


701 


YRTQPDRHSY 



10 20 30 40 50 60 

1 5 orf 23a . pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 23 - 1 MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 



70 80 90 100 110 120 

20 orf 23a . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 

1 1 1 1 1 M i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M h II I M I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 

orf 23 - 1 PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 

70 80 90 100 110 120 



130 140 150 160 170 180 

25 orf 23a . pep SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 

I I I II II I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M 
orf 23 - 1 SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 

130 140 150 160 170 180 



190 200 210 220 230 240 

30 orf 23a . pep KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I Ml II I hi I II II II II II I II Nihil II II MM 1 1 

orf 23 - 1 KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 



250 260 270 280 290 300 

35 orf 23a . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

I I I I II I I I I I I I I I I I I I I I I I I II I II M I I I I I I II I I II I II I I II I I II II I I II 
orf 23 - 1 LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 



310 320 330 340 350 360 

40 orf 23a . pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS IDHNTAATDLI PGYWHADPRTH 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II II I II I II 1 1 II 

orf 23 - 1 NLFAG I EHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLS IDHNTAATDLI PGYWHADPRTH 

310 320 330 340' 350 360 



370 380 390 400 410 420 

45 orf 23a. pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

I II I I I I I I I M II I I I II I M I I I I I I I II I I I I I I I I M I I I I I I I I I I II I I I I I I I 
orf23-l SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 



50 



orf 23a. pep 



430 440 450 460 470 480 

FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
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j ! 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 j J 1 1 1 ! 1 1 J 1 1 1 1 1 1 1 1 Ill 

orf 23 - 1 FAQT I PQ YGTRRQ I GGYLATRFRAADNLS L I LGGRYTR YRTGS YDSRTQGMT YVS ANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 23a . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I I I I I 
orf 23 - 1 PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 



550 560 570 580 590 600 

1 0 orf 23a . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

I i 1 1 1 1 ! 1 1 i I M M 1 1 1 1 1 II 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 I! 1 1 

orf 23 - 1 AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 



610 620 630 640 650 660 

15 orf 23a. pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

II 1 1 M 1 : 1 1 1 1 1 1 1 1 1 1 Ml I M 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 I M II I . I II 1 1 

orf 23 - 1 DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 



670 680 690 700 710 720 

20 orf 23a . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

■ I I I I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I 
orf 23 - 1 ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



25 orf 23a. pep TYRFKX 

MINI 

orf23-l TYRFKX 

Homology with a predicted ORF from N .gonorrhoeae 

ORF23 (SEQ ID NO: 664) shows 93.4% identity over a 211aa overlap with a predicted ORF 
30 (ORF23.ng) (SEQ ID NO: 670) from N. gonorrhoeae: 



orf 23 .pep GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLD 51 

I I I I I I I I I I I I , I I I I I I I I I I I I I ' I I I I I I I I I I I I I I I I I I I I I 

or f 2 3 ng S AVDACRI PG YNYLFARGSRI ANYQING I PVADALADTGNANTAAYERVEWRGVAGLPD 6 0 

orf 23 .pep GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 

35 III II III I llllh MM II II 1 1 II II Ih III II Ml M HI III I III 1 1 

orf 23ng GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

orf 23 . pep GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 

llllh I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I 

orf23ng GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

40 orf 23. pep GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 211 

I I I ^ I I - I I I I I I ' I I I I I I I I I I I I I I I I 

orf23ng GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 240 
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The ORF23ng nucleotide sequence (SEQ ID NO: 669) is predicted to encode a protein comprising 
amino acid sequence (SEQ ID NO: 670): 

1 SAVDACRIPG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

5 101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

201 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

10 3 51 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

4 01 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

451 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RTIGAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVADIMARYR FNPRTELSLN VDNLFNKHYR 

15 -601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 671): 

1 ATGACACGCT TCAAATACTC CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

20 101 CCGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CCGTTTCCGG CACGCACACC CCGTTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTGGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

25 351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 

501 TTCTGCCACC GTCAATCTGG TACGCAAACA CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCC GGCAACCGCA AACATTTCGG GCTGGGCGCG 

30 601 GACGTATCGG GCAGCCTGAA CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 

701 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CAGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

35 851 CAAAAGACAA CCCCGCCACA AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATAGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 

1001 CAGGCGTACT TTCCATCGAC CACAGCACTG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCcgatCC GCGCACCCAC AGCGCCAGCA TGTCATTGAC 

40 1101 CGGCAAATAC CgcctGTTCG GCCGCGAGCA CGATTTAATC GCGGGTATCA 

1151 ACGGCTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATTCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGCG CCTATCCGCA 

1251 GCCATCATCG TTTGCCCAAA CCATCCCGCA ATACGACACC AGGCGGCAAA 

13 01 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 
45 13 51 ATACTCGGCG GCAGATACAG CCGCTACCGC GCAGGCAGCT ACAACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 
1451 GCATCGTGTT CGATCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 
1501* AGCCTGTTCG TCCCGCAATT GCAAAAAGAC GAACACGGCA GCTACCTGAA 
1551 ACCCGTAACC GGCAACAATC TGGAAGCCGA CATCAAAGGC GAATGGCTTG 

50 1601 AAGGGCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCAGAGC GGCAACACCT ACTATCGCGC 

1701 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGCT ACAGCCAAAG CAAACCCCGC 

18 01 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTAcCCG AACGCAGCTT 

55 1851 CAAACTCTTC ACCGCCTACC ACTTAGCCCC CGAAGCCCCC AGCGGCCGGA 

1901 CCATcggTGC GGGTGTGCGC CGGCAGGGCG AAACCCACAC CGACCCAGCC 
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1951 GCGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG TCGCCAACAG 

2001 CCGCCAGAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCAC CGAACTGTCG CTGAACGTGG ACAACCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 672; ORF23ng-l): 



10 



15 



20 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MTRFKYSLLF AALLPVYAQA 



DGYTVSGTHT 
TSRQIYGSDR 
VEWRGVAGL 
DVSGSLNAEG 
VHAGMDYQQA 
NLFAGIEHRF 
GYWHADPRTH 
NAIPNAYEFS 
ILGGRYSRYR 
SLFVPQLQKD 
LATAAGRDQS 
DQDGSRLNPD 
ALRIPNPAAK 
YRTQPDRHSY 



PFGLPMTLRE 
AGYNYLFARG 
PDGTGEPSAT 
TLRGRLVSTF 
KETADAPLSY 
NQDWKLKAEY 
SASMSLTGKY 
RTGAYPQPSS 
AGSYNSRTQG 
EHGSYLKPVT 
GNTYYRAANQ 
SVPERSFKLF 
ARAVANSRQK 
GALRTVNAAF 



DVSVSDDPKP 
IPQSVSVITS 
SRIANYQING 
VNLVRKHPTR 
GRGDSWRQLE 
AVYDSQGYAT 
DYTRSRFRQP 
RLFGREHDLI 
FAQTIPQYDT 
MTYVSANRFT 
GNNLEADIKG 
AKTHGWEIEV 
TAYHLAPEAP 
AYAVADIMAR 
TYRFK* 



QESTELPTIT 
QQMRDQNIKT 
I PVADALADT 
KPLFEVRAEA 
RSRDAELYGI 
AFGPKDNPAT 
YGVAGVLSID 
AGINGYKYAS 
RRQIGGYLAT 
PYTGIVFDLT 
EWLEGRLNAS 
GGRITPEWQI 
SGRTIGAGVR 
YRFNPRTELS 



VTADRTASSN 
LDRALLQATG 
GNANTAAYER 
GNRKHFGLGA 
LEYDIAPQTR 
NWSNSRNRAL 
HSTAATDLIP 
NKYGERS IIP 
RFRAADNLSL 
GNLSLYGSYS 
AAVYRARKNN 
QAGYSQSKPR 
RQGETHTDPA 
LNVDNLFNKH 



25 



ORF23ng-l (SEQ ID NO: 672) and ORF23-1 (SEQ ID NO: 666) show 95.9% identity in 725 aa 
overlap: 



30 



10 20 30 40 50 60 

orf 23-1 .pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

IMMIIIII III IIIMMMIIMMIIIIIIIIIMMIMIIII IIIIIMIIIII 

orf23ng-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

10 20 30 40 50 60 



35 



70 80 90 100 110 120 

orf 23 - 1 . pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQI YGSDRAGYNYLFARG 

hi III MM MMIM MINI Mill II I MINIMUM I II IIIIIMIIIII 

orf23ng-l PFGLPMTLRE I PQSVSVITSQQMRDQN I KTLDRALLQATGTSRQ I YGSDRAGYNYLFARG 

70 80 90 100 110 120 



40 



130 140 150 160 170 180 

orf 2 3 - 1 . pep SRIANYQINGI PVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRLTR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i I MMIMMMMM II 

orf23ng-l SRIANYQINGI PVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 

130 140 150 160 170 180 



45 



190 200 210 220 230 240 

or f 2 3 - 1 . pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 MMMMMMMMM MMIMM IMMIMIM 

orf23ng-l KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGI 

190 200 210 220 230 240 



50 



250 260 270 280 290 300 

orf 23 - 1 . pep LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

MMMMMMIMM MMM MMMMIMM IMMIMMMMMMM 

orf23ng-l LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 23 - 1 . pep NLPAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNTAATDLIPGYWHADPRTH 

IMMMMMMIMMMMMIMI MIIIMI IMIIIIIIIIIIIIII I 

orf23ng-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 

310 320 330 340 350 360 



10 



370 380 390 400 410 420 

orf 23 -1 .pep SASVSLIGKYRLFGREHDLI AGINGYKYASNKYGERS I I PNAI PNAYEFSRTGAYPQPAS 

111 = 11 1 1 1 1 'I I M 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 i 1 1 II 1 1 M 1 1 1 1 1 1 1 1 : 1 

orf23ng-l . SASMSLTGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPSS 

370 380 390 400 410 420 



15 



430 440 450 460 470 480 

orf 23 - 1 . pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

I I I M I I I I I I I I I I I I I I I I II I I I I I I I I I I I I = I I I : I I I : I II I I I I I I I I I I I I 
orf 23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 



20 



490 500 510 520 530 540 

orf 23 - 1 . pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

MIMIII MM IMMIMII Ml MMMMMMMMMI MMMMMMI 

orf 23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 



25 



550 560 570 580 590 600 

orf 2 3 - 1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

I Ml I II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 Ml 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 I 

orf 23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 

550 560 570 580 590 600 



30 



610 620 630 640 650 660 

orf 23 - 1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 IMI MM IIMMIIMM 

orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 

610 620 630 640 650 660 



35 



670 680 690 700 710 720 

orf 23 - 1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

llh MMMMMMMMIIMM IMIIMI MMMMMIMMMMM 

orf 23ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 

670 680 690 700 710 720 



orf 23-1. pep TYRFKX ' 
MINI 

orf23ng-l TYRFKX 

40 

In addition, ORF23ng-l (SEQ ID NO: 672) shows significant homology with an OMP (SEQ ID 
NO: 1 155) from E.colk 

sp|P16869|FHUE_ECOLI OUTER -MEMBRANE RECEPTOR FOR FE ( III ) -COPROGEN, FE(III)- 
FERRIOXAMINE B AND FE ( III ) -RHODOTRULIC ACID PRECURSOR ) gi | 1651542 | gnl | PID | dl015403 

45 (D90745) Outer membrane protein FhuE precursor [Escherichia coli] 

)gi|l651545|gnl|PID|dl015405 (D90746) Outer membrane protein FhuE precursor 
[Escherichia coli] ) gi | 1787344 (AE000210) outer -membrane receptor for Fe(III)- 
coprogen, Fe ( III ) - f errioxamine B and Fe (III ) -rhodo'trulic acid precursor 
[Escherichia coli] Length = 729 

50 Score = 332 bits (843), Expect = 3e-90 
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Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps = SO/111 (8%) 

Query: 38 TITVTADRTASSN- -DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

Sbjct: 43 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

5 Query: 96 LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY + +GIP + DAL+D A 
Sbjct: 103 ENTLG I S KSQADSDRALY YSRGFQIDNYMVDGIPTYFESRWNLGDALSDM AL 154 

Query: 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 
+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

10 Sbjct: 155 FERVEVTOGATGIJyiTGTGNPSAAINMTOKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query: 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 266 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKI RAR I VGGYQNNDS WLDRYNS EKTFFSG I VDADLGDLTTLS AGYEYQR IDVNS PT 274 

Query: 267 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 
15 +++ G + ++ + A +W+ + +F ++ +F W+ + + 

Sbjct: 275 WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Query: 327 F- -RQPYGVAGVLSIDHSTAA- -TDLIPGY WHADPRTHSA- SMSLTGKYRLFG 374 

F + Y A V D ++ PG+ W++ R A + G Y LFG 

Sbjct: 33 5 FDSKmYVDAYTOKADGMLVGPYSNYGPG 394 

20 Query: 375 REHDLI AGINGYKYASNKYGER- - S I I PNAIPNAYEFSRTGAYPQPSSFAQTI PQYDTRR 432 

R+H+L+ G Y +N+Y +1 P+ I + Y F+ G + PQ Q++ Q DT 

Sbjct: 395 RQHNLMFG - GS YS KQNNRYFS SWAN I FPDE I GS FYNFN - - GNFPQTDWS PQSLAQDDTTH 451 

Query: 433 Q I GGYLATRFRAADNLSL I LGGRYSRYRAGS YNSRTQGMT Y - VS ANRFTP YTG I VFDXXX 491 
Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

25 Sbjct: 452 MKSLYAATRVTLADPLHLILGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 

Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAIFRIEQDNV 564 

Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

30 A + G +G T Y+A + + G E E+ G IT WQ+ G + + D +G+ +N 

Sbjct: 565 AQSTGTP I PGSNGETAYKAVDGTVS KGVE FELNGAI TDNWQLTFGATRY I AEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 

p ++ p + K+FT+Y LP P T+G GV Q +TD P RA 
Sbjct: 625 P -NLPRTTVKMFTSYRL- PVNPE- LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

35 Query: 669 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 724 

Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

Sbjct: 673 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 729 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 ORF23-1 (SEQ ID NO: 666) (77.5kDa) was cloned in pET and pGex vectors and expressed in 
Exoli, as described above. The products of protein expression and purification were analyzed by 
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SDS-PAGE. Figure 15 A shows the results of affinity purification of the His-fusion protein, and 
Figure 15B shows the results of expression of the GST-fusion in E.colL Purified His-fusion protein 
was used to immunise mice, whose sera were used for Western blot (Figure 15C) and for ELISA 
(positive result). These experiments confirm that ORF23-1 (SEQ ID NO: 666) is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 80 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 673): 



1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

2.51 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

3 01 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

3 51 TnAGTCGCCG ACGGGG . . 



This corresponds to the amino acid sequence (SEQ ED NO: 674; ORF24): 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 675): 



1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

2 01 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

2 51 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

3 01 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

3 51 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAATACCAA 

4 01 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 
451 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 
501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 
551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 
6 01 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 
651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 
701 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 
751 ATATTGATGG AGCTGCACAC AATATCGGTA GTCTTCATCG CTTCGGGAAT 
801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 
851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 
901 AAAGTTTGCG CCACGCTGAC GTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 676; ORF24-1): 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA 
101 PCVPQTLKPI XSRMRATXSP TG. . 



QTAVMASSLS 
SFSNAKAAW 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
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51 
101 
151 
201 
251 
301 



SVSTPASAAA 
PCVPQTLKPI 
RVILKAVFFT 
PAINGLSSTA 
ILMELHTISV 



IIPSSSETGI 
SSRMRATESP 
TSATSVNWA 
LQNTTILAQP 
VFIASGMERI 



NAPLKPPTAL 
TAGVGASDKS 
SEFSNAAFTT 
KPSGVISAVR 



EAIMPPFFTA 
RIPNGIFSIF 
PGPDTPTLIT 
LTVSPA^LTA 



SFSNAKAAW 
EASRPMSSPT 
ASASPEP*NA 
SILIPARVLP 



NTSSEGDIPF CTNAEKPPIK DTPMALAALS 



KVCATLT* 



Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N.meningitidis (strain A) 

10 ORF24 (SEQ ID NO: 674) shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) 
(SEQ ID NO: 678) from strain A of TV. meningitidis: 



15 



10 20 30 40 50 60 

orf24a.pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I I II I I I I I I M II I I I I I I I I I I I I I Ml I I I I I I I I I I I I I : I II I i: I I I I I I I I 
orf 24 MRTAVVLLLIMP^4AASSAMMPEMVCAGVSPGTAI I SKPTEQTAVMASSLS SVSTPASAAA 

10 20 30 40 50 60 



20 



70 80 90 100 110 120 

orf24a.pep I I PSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKP I SSRMRATESP 

II II 1 1 II III II III MINIMI MINIMI 1 1 Mill MINI MM I III MM 

orf24 IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 



25 



130 140 150 160 170 180 

orf24a .pep TAGVGASDKSRI PNGIFS I FEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

II I I II II I I I II II I I II II I I I I II II II I I I II I I I II I II I I I I I I II II II II I I 
orf24 TAGVGASDKSRI PNGIFS I FEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 



30 



190 200 210 220 230 240 

orf24a.pep PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPASLTA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 M 1 1 M 1 1 M 1 1 III llllllll 

orf24 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVS PASLTA 

190 200 210 220 230 240 



35 



250 260 270 280 290 300 

orf24a.pep S I LI PARVLP ILMELHT I SWFI ASGMERXNTSSEGDI PFCTSAEKPP I KDTPMALAALS 

I I I I I I I I , I I I I I I I I I I I I I I I I I I I I I I' I I I I I I M I I I U I I I I I I II I I I 
orf24 SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 



orf 24a .pep KVCATLTX 
llllllll 

40 orf 24 KVCATLTX 

The complete length ORF24a nucleotide sequence (SEQ ID NO: 677) is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 
51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 
45 101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 
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151 AACGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

5 351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

10 601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

15 851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 678): 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

20 51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVS PASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

25 301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 

ORF24a (SEQ ID NO: 678) and ORF24-1 (SEQ ID NO: 676) show 96.4% identity in 307 aa 
overlap: 

30 10 20 30 40 50 60 

or f 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 

I M 1 1 1 1 N 1 1 1 1 1 1 1 1 1 1 1 1 I Ml 1 1 1 1 1 1 1 M MINIMI IMMIMIM 

orf24-l MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

10 20 30 40 • 50 60 

35 70 80 90 100 110 120 

orf24a.pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKP I SSRMRATESP 

Mill 1 1 1 1 1 M I M 1 1 1 M 1 1] M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 

orf24-l IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 

40 130 140 150 160 170 180 

orf24a.pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

MINI ! I M M 1 1 1 M 1 1 M 1 1 II 1 1 II 1 1 Ml 1 1 1 M 1 1 1 1 : 1 

orf24-l TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

130 140 150 160 170 180 

45 190 200 210 220 230 240 

orf24a.pep PGPDTPTL I TAS AS PEPXNAPAIXGLSSXALQNTT I LAQPKPSS VI SXVRLMVS PASLTA 

IIIIIIIIIIMIIIIIIIIIII lllhlllllllllllllhlll III IIIMIM 

orf 24 - 1 PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVS PASLTA 

190 200 210 220 230 240 
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250 260 270 280 290 300 

orf 24a . pep S IL I PARVLP I LMELHTI S WF I ASGMERXNTS S EGD I PFCTSAEKPP I KDTPMALAALS 

IMIMIIIII III Mil IIIIIMMM III I III INI hll MM INI III II II 

orf24-l SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 

orf 24a . pep KVCATLTX 
Illlllll 

orf24-l KVCATLTX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF24 (SEQ ID NO: 674) shows 96.7% identity over a 121 aa overlap with a predicted ORF 
(ORF24ng) (SEQ ID NO: 680) from N. gonorrhoeae: 

orf 24 .pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 60 

I II I II M 1 1 ! I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! hi 1 1 1 1 II II I M M M hi 1 1 1 1 1 1 

orf 24ng MRTAWLLL I MPMAAS S AMMPEMVCAGVS PGTAIMS KPTEQTAVMAS SLS S VNTP AS AAA 60 

orf 24 . pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPIXSRMRATXSP 120 

1 1 1 1 II I M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINI II 

orf 24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 120 

orf 24. pep TG 122 
h 

orf 24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 180 

The complete length ORF24ng nucleotide sequence (SEQ ID NO: 679) is: 

1 ATGCGCACGG CGGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCGATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATGTCCAA ACCAACGGAG CAGACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAACA CGCCTGCCTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

2 01 AACGGGGATA AACGCGCCGC TCAAACCGCC GACCGCGCTG GAAGCCATCA 
251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

3 01 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 
351 CGAGTCGCCG ACGGCGGGGG TCGGTGCCAG CGACAAATCG AGAATGCCGA 
401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GACCGATGAG TTCGCCCACG 
451 CGGGTGATTT TGAAAGCGGT TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 
501 GCTGACCGCG TCCGAATTTT CCAGCGCGGC TTTGACCACG CCTGGACCGG 
551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCCGAGCC GTGGAACGCA 
601 CCCGCCATAA ACGGATTGTC TTCCACCGCG TTGCAGAACA CGACGATTTT 
651 GGCGCAGCCG AAACCTTCGG GTGTGATTTC AGCCGTGCGT TTGATGGTTT 
701 CGCCTGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTGCTGCCG 
751 ATATTGATGG AGCTGCACAC GATATCGGTA GTTTTCATCG .CTTCGGGAAC 
801 GGAACGGATC AACACCTCAT CCGAAGGCGA CATACCTTTT TGCACCAGCG 
851 CGGAAAAGCC GCCGATAAAG GACACGCCGA TGGCTTTGGC TGCCTTGTCC 
901 AAAGTCTGCG CCACGCTGAC ATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 680): 



1 MRTAWLLL I MPMAAS SAM M PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 
51 S VNTP AS AAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 
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151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

2 01 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PARVLP 

2 51 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

5 

ORF24ng (SEQ ID NO: 680) and ORF24-1 (SEQ ID NO: 676) show 96.1% identity in 307 aa 
overlap: 



10 20 30 40 50 60 

orf 24 - 1 . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISKPTEQTAVMASSLSSVSTPASAAA 

IIMMMMMMMMMMMMMMMMMMIIII IIMM IIMM III 

orf24ng MRTAWLLL IMPMAAS SAMMPEMVCAGVS PGTA IMS KPTEQTAVMASS LS SVNTPAS AAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 24-1 .pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

1 1 1 ! I M 1 1 1 1 M ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 24-1 .pep TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 

I I I I I I I I M I I M I I I I I I I I I I I I I I I I I I I I I I I I M I ::||l||:|hll 
orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 24 -1 . pep PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

25 Illlllll Illlllll Ml I Mill I MINI 111 1 1 Ml Nil II Illlllll 

orf24ng PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 24-1 .pep SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 
30 | | M | | | | | | | | | | | | | | M | | | | | | | | | | | M | | | | | | | | : | | | | | | | | | | | | | | | | | 

orf 24ng S I LI PARVLP I LMELHT I S WF I ASGTERINTSSEGDI PFCTSAEKPP I KDTPMALAALS 

250 260 270 280 290 300 



orf 24-1. pep KVCATLTX 

35 Illlllll 
or f 2 4ng KVCATLTX 

Based on this analysis, including the presence of a putative leader sequence (first 18 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, it 
is predicted that the proteins from N .meningitidis and N. gonorrhoeae, and their epitopes, could be 
40 useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 681): 
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1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 
51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 
101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 
151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

5 

This corresponds to the amino acid sequence (SEQ ID NO: 682; ORF25): 

1 ..TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

10 Further work revealed the complete nucleotide sequence (SEQ ID NO: 683): 



1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

15 201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

20 451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

25 701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

30 951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 684; ORF25-1): 



1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

35 51 SFAREDGRQF VDADKI I AAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

40 3 01 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .meningitidis (strain A) 

ORF25 (SEQ ID NO: 682) shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) 
45 (SEQ ID NO: 686) from strain A of N. meningitidis: 



orf 25 .pep 



10 20 30 

TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 
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or f 2 5a VTVS RGE VEE ARVQNQRAE S E I TKLWGGLDTDVQKELVGEXRKWAQEKI SNCRQAAAQAD 

250 260 270 280 290 300 

40 50 60 

5 orf 25 .pep RQEYAEYLKLQCDTRMTRERIQYLRGYS IDX 

I I I I I I I I I I I I I I I .1 I I I I I I I I I I I II I 
orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYS IDX 

310 320 330 

10 The complete length ORF25a nucleotide sequence (SEQ ID NO: 685) is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

15 201 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 

3 51 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 

20 4 51 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 

601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 

651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 

25 '701 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

30 951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 686): 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

35 51 SFAREDXXQF VDADXI I AAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPEPEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKISNC 

40 3 01 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a (SEQ ID NO: 686) and ORF25-1 (SEQ ID NO: 684) show 93.5% identity in 338 aa 
overlap: 

10 20 30 40 50 60 

45 orf 25a . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

I I I I I I I I , I I I I M I II M I I I I I i I I I I i I I II lllllllllllllllll II 
orf 25-1 MYRKL I ALPFALLLAACGREE PPKALECANPAVLQG I RGN IQETLTQEARS FAREDGRQF 

10 20 30 40 50 60 

70 80 90 100 110 120 

50 orf 25a. pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 

1 1 1 1 Mill Illlllllll I Mill II II Mill III MM II II II MUM 
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orf25-l VDADKI I AAAYGLAFSLEHAS ETQEGGRTFC I ADLN I TVPS ETLADAKANS PLLYGETAL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 

5 1 1 1 1 1 1 1 1 1 M II II II II I M 1 1 1 1 1 1 1 1 M 1 1 III 1 1 1 II I II 1 1 1 M II 1 1 1 1 1 1 

orf 2 5-1 SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

130 • 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a. pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 

10 Illllllllllllll I II II III H II I II Mil Ml I llh I Mi II I I II 

orf25-l M I DGKAVKKEDAVR I LSGKAREEEPS KPTPED I LEHNAAGGDAGVPQAAEGAPE PE I LHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 2 5a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 

15 lllllllllll MINI MM III II MINI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 lllllllllll 

orf 2 5 - 1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

orf 25a . pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

20 I I I I I I II II I I I II I II I I I I I I I I I II II I I I I II I I 

orf 25- 1 RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

310 320 330 

Homology with a predicted ORF from N. gonorrhoeae 

ORF25 (SEQ ID NO: 682) shows 100% identity over a 60aa overlap with a predicted ORF 
25 (ORF25ng) (SEQ ID NO: 688) from N. gonorrhoeae: 

orf 25 . pep TDVQKELVGEQRKWAQEKI SNCRQAAAQAD 30 

Mill IMIIIIIIIM IMIIIIII 
orf2 5ng VTVSRGEVEEARVQNQRAESE I TKLWGGLDTDVQKELVGEQRKWAQEKI SNCRQAAAQAD 3 08 

orf 25. pep RQE YAE YLKLQCDTRMTRER I QYLRGYS I D 60 

30 M I I I I II I I I II I I I I II I II II I I I I I I 

orf25ng RQE YAE YLKLQCDTRMTRER I QYLRGYS ID 338 

The complete length ORF25ng nucleotide sequence (SEQ ID NO: 687) is: 

1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCAGCGTG 

35 51 CGGCAGGGAA GAACCGCCCA AGGCGTTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAGGACAT ACGCGGCAGT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

40 301 TCTGAAACGC TTGCCGATGC CGAGGCAAAC AGCCCCCTGC TGTATGGGGA 1 

351 AACGTCTTTG GCAGACATCG TG.GAGCAGAA GACGGGCGGC AATGTCGAGT 

"401 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGCCAAAGAC 

451 GCTCGGACGG CATTTATCGA CAACACGGTC GGTATGGCGA CGCAAACGCT 

501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

45 551 GCAAGGCGGT GACAAAAGAA GACGCGGTCA GGGTTTTGAG CGGCAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACCCCC GAAGACATTT TGGAACACAA 
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651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCACCCG 

701 AACCCGAAAT CCTGCATCCC GACGACGTCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA' AGAGGCGCGC GTACAAAACC AACGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

5 851 AGTTGGTCGG CGAACAGCGC AAGTGGGCGC AGGAAAAAAT CAGcaactgc 

901 cgACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTCCAATGC GACACGCGGA TGACGCGCGA ACggaTACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

10 This encodes a protein having amino acid sequence (SEQ ID NO: 688): 



1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKI I AAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

15 201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng (SEQ ID NO: 688) and ORF25-1 (SEQ ID NO: 684) show 95.9% identity in 338 aa 
20 overlap: 



10 20 30 40 50 60 

orf 25-1 .pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 

M I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I H I I I I I ' I M I I I I I I I I I 
orf 25ng MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 
25 10 20 30 40 50 60 

70 80 . 90 100 110 120 

orf 25-1 .pep VDADKI IAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKANSPLLYGETAL 

II I Mill II I II II Mil III I III MM II II Mil Mil III I MM II II II hi 

orf 25ng VDADKIIAAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAEANSPLLYGETSL 
30 70 80 90 100 110 120 



130 140 150 160 170 180 

orf 25 - 1 . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 

:|l h I I II I I M I I I I II I I I I I M h M -II h II M I I I- I II I I I II I I M M 
or f 2 5 ng AD I VQQKTGGNVEFKDGVLTAAVRFLPAKDARTAF I DNTVGMATQTLS AALLP YGVKS I V 

35 130 140 150 160 170 180 



190 200 210 220 230 240 

orf25-l .pep MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 

lllllll 1 1 1 1 M 1 1 II 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I 

orf 25ng MIDGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPEPEILHP 
40 190 200 210 220 230 240 



45 



250 260 270 280 290 300 

orf 25 - 1 . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

II M II 1 1 1 1 1 1 1 1 1 1 1 1 , 1 ! M i M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 

orf 25ng DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 ' 300 



50 



310 320 330 339 

orf 25- 1 . pep RQAAAQADRQE YAE YLKLQCDTRMTRER I QYLRGYS I DX 
IIIIIIIMIIIIIIII I I MIIIIIIIIIIIMM 
or f 2 5 ng RQAAAQADRQE YAE YLKLQCDTRMTRER I QYLRGYS I DX 

310 320 330 
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Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (SEQ ID NO: 684) (37kDa) was cloned in pET and pGex vectors and expressed in 
E.coli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 16A shows the results of affinity purification of the GST-fusion protein, and 
Figure 16B shows the results of expression of the His-fusion in E.coli. Purified His-fusion protein 
was used to immunise mice, whose sera were used for Western blot (Figure 16C), ELISA (positive 
result), and FACS analysis (Figure 16D). These experiments confirm that ORF25-1 (SEQ ID NO: 
684) is a surface-exposed protein, and that it is a useful immunogen. 

Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1 (SEQ 
ID NO: 684). 

Example 82 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 689) 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGGCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 



This corresponds to the amino acid sequence (SEQ ID NO: 690; ORF26): 



CHIR-0160 (356.001) 



-499- 



PATENT 



1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILXX VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDXDW SLGKPKILVF XILLGIFTSL LTYSGSN . . . 

// 

251 TSLV 

301 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

401 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KK. . 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 691): 



1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

3 01 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

3 51 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 
4 51 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 
501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 
551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 
601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 
651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 
701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 
751 ACCAAAGGTC GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 
801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 
851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 
901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 
951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CGGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 



This corresponds to the amino acid sequence (SEQ ED NO: 692; ORF26-1): 



1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILVG VAFLVGGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY AL I I PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KKRANA* 



CHIR-0160 (356.001) 



-500- 



PATENT 



Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical transmembrane protein HI1586 (SEP ID NO: 1156) of 
H. influenzae (accession number P44263) 

ORF26 (SEQ ID NO: 690) and HI1586 (SEQ ID NO: 1156) show 53% and 49% amino acid 
5 identity in 97 and 221 aa overlap at the N-terminus and C-terminus, respectively: 

Orf26 1 MQLIDYSHSFFSWPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 73 

Orf26 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 
10 V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

Orf26 86 IFTSLLTYSGS- -NTSLVFGGTCGVFAWLCTL- -GTIKTADYPKAVWQGAKSMFGXXXX 141 
+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
15 HI1586 299 VFSVLGTFENTWGTSLWGGFCSI I ISTLLI ILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 

Orf26 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLP 418 

0rf26 202 IAAAMAVKVEPALI IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXX 261 
20 IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 

HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf26 262 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 

S L GF T + L V+IF +K + 
HI1586 479 ATVATATS IGYI WGFTYSGLAGFAATAVSLIVI IFAVKKR 519 

25 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 (SEQ ID NO: 690) shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) 
(SEQ ID NO: 694) from strain A of N. meningitidis: 

10 20 30 . 40 50 60 

orf 26 . pep MQL I DYSHS FFS WPP FLALALAVI TRRVLLSLG I GI LXXVAFLVGGNPVDGLTHLKDMV 

30 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1! II II 1 1 1 1 1 1 II I llllllllllllllllllll 

orf 26a MQL IDYSHSFFSVVPPFLALALAV I TRRVLLSLG I GILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 

70 80 90 99 
orf 26 . pep VGLAWSDXDWS LGKP K I LVFX I LLG I FTS LLT Y SGSNXX 

35 Ilillll llllllll III Ml MINI Ilillll 

o r f 2 6 a VGLAWSDGDWS LGKP KXLVFL I LLG I FTS LLT Y SGSNQAFADWAKRH I KNR RGAKMLTAC 

70 80 90 100 110 120 
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orf26.pep 

or f 2 6a LVFVTFID DYFHSLAVGAXARPVTDKFKVSRAKIiAYILDSTAAPMCVLMP VSSWGASIIA 

130 140 150 160 170 180 

orf 26 . pep 

orf 26a TLAGLLV TYKITEYTPMGTFVAMSLMNYY ALFALIMVFVVAWFSFDI GSMARFEQAALiNE 

190 200 210 220 230 240 

100 110 

orf 26. pep TSLV 

I I I I 

or f 2 6 a AHDETAVSDGSWGRVY ALI I PVLALIASTVSAMI YTGAQASETFS I LGAFENTDVNTS LV 

250 260 270 280 290 300 

120 130 140 150 160 170 

orf 26 . pep FGGTCGVFAWLCTL GTI KTADYPKAVWQGAKSM FGAI AI LI LAWLI STW GEMHTGDYL 

I II M i HI 1 1 1 1 1 1 1 1 1 I II 1 1 ! I II I II M II 1 1 1 1 M I M 1 1 1 1 1 1 1 1 1 M i 

orf 2 6a FGGTCGVLAWLCTL GTIKIADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYIj 

310 320 330 340 350 360 

180 190 200 210 220 230 

orf 26 .pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

I I I I I I I I I I I I I I I I I I I I I I II I I II II I I I I I I I I I I I I II I I I I : I : I M I I M I 

orf 26a STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 

370 380 390 400 410 420 



240 250 260 270 280 290 

orf 26 . pep VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
or f 2 6 a VMAGAVCG DHCS P I SDTT I LSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 

430 440 450 460 470 480 



300 310 
or f 2 6 . pep LLGFGTTGI VLAVLI FL LKDKK 

I I I I M I I I I I II I I I I I II 
orf 2 6a LLGFGXTGIVLAVLIFL LKDKKRANAX 

490 500 



The complete length ORF26a nucleotide sequence (SEQ ID NO: 693) is: 



1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

2 01 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 
251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

3 01 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 
351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 
4 51 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 
501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 
551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 
601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 
651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 
701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 
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10 



15 



751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



AGCTGGGGCA 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAATC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCC 
TTTCCGACAC 
GACCACGTTA 
CGCATCGGGN 
TTGGCANGAC 
AAAAAACGCG 



GGGTTTACGC 
TCCGCCATGA 
GGGTGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACAGG 
GGCTTCCTGN 
CACAGGCACA 
CCATGGCGGT 
GTGATGGCGG 
GACCATCCTG 
CNTCGCAACT 
TACCTCGCAT 
AGGCATTGTA 
CCAACGCCTG 



ATTGATTATT 
TCTACACCGG 
GAAAATACGG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTTG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAT 
GGGCGGTATG 
TCGTCCACCG 
GCCTTACGCC 
TGGGTCTGAC 
TTGGCGGTGC 
A 



CCCGTTTTGG 
TGCACAGGCA 
ACGTGAACAC 
GTCCTCTGCA 
TTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
TTTCCTGCTC 
CGTTCGGCAT 
CCCTCACTGA 
CGGCGACCAC 
GCGCGCGCTG 
TTAACCGTTG 
AAAATCCGCG 
TGATTTTTCT 



CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGCTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
CATGCTGCCG 
TTATCCCGTG 
TGCTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGTT 
GTTGAAAGAT 



This encodes a protein having amino acid sequence (SEQ ID NO: 694): 



20 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



MQLIDYSHSF 
DGLTHLKDMV 
ADWAKRHIKN 
RAKLAYILDS 
VAMSLMNYYA 
SWGRVYALII 



FSWPPFLAL ALAVITRRVL LSLGIGILVG 
VGLAWSDGDW SLGKPKXLVF LILLGIFTSL 



R RGAKMLTAC LVFVTFID DY FHSLAVGAXA 
TAAPMCVLMP VSSWGASIIA TLAGLLV TYK 
LFALIMVFW AWFSFDI GSM ARFEQAALNE 
PVLALIASTV SAMI YTGAQA SETFSILGAF 
FGGTCGVLAV VLCTLGTIKI ADYPKAVWQG AKSM FGAIAI 
VGEMHTGDYL 
IAAAMAVKVD 
DHVTSQLPYA 
KKRANA* 



VAFLVGGNPV 
LTY SG5NQAF 
RPVTDKFKVS 
ITEYTPMGTF 
AHDETAVSDG 
ENTDVNTSLV 
LILAWLISTV 



STLVAGNIHP GFLXVILFLL ASVMAFA TGT SWGTFGIMLP 
P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 
LTVAAAAASG YLALGLTKSA LLGFGXTGIV LAVLIFLLKD 



ORF26a (SEQ ID NO: 694) and ORF26-1 (SEQ ID NO: 692) show 97.8% identity in 506 aa 
overlap: 



35 



10 20 30 40 50 60 

orf26a.pep MQL I DYSHS FFS WPPFLALALAV I TRRVLLS LG I G I LVGVAFLVGGNPVDGLTHLKDMV 

I I I I I I I II I I I : I I I I I I I I I I I II I I M I I I I I I I i M I I I I I I I I I I I I M I I I 
orf26-l MQL IDYSHSFFSWPPFLALALAV I TRRVLLSLGIGI LVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 



40 



70 80 90 100 110 120 

orf 26a . pep VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

Illlllllllllllll IIIIIIIMIIIIIIIIIIIMI MINI MIMIMIIII 

orf 26-1 VGLAWSDGDWS LGKPKI LVFL I LLG I FTS LLTYSGSNQAFADWAKRH I KNRRGAKMLTAC 

70 80 90 100 110 120 



45 



130 140 150 160 170 180 

orf 26a . pep LVF VTF I DDYFHSLAVGAXARPVTDKFKVS RAKLAYILDS TAAPMCVLMP VSSWGAS I IA 

III II llllllllll II I I.I I I I I I I I I hi ! I I I II II Illlllllllllllll II 

orf 26-1 LVFVT FI DD YFHSLAVGA I ARPVTDKFKVSRTKLAY I LDS TAAPMCVLMP VSSWGASIIA 

130 140 150 160 170 180 



50 



190 200 210 220 230 240 

orf 26a . pep TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 : 1 I I I I I ' I I I I I I I I I I I I I M II I I 

orf 26-1 TLAGLLVTYK I TEYTPMGTFVAMSLMNYYALFAL I MVFWAWFSFD I GSMARFEQAALNE 

190 200 210 220 230 240 
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250 260 270 280 290 300 

orf 26a. pep AHDETAVSDGSWGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

Illllllll:: Mill IIIIIM IIIIMIIMII lllllllllll IIIIIMIIII II 
orf 26-1 AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 



10 



310 320 330 340 350 360 

orf 26a .pep FGGTCGVLAWLCTLGT I KI ADYPKAVWQGAKSMFGAI AI LI LAWL I STWGEMHTGDYL 

Illllllllllllllllll IIIIIIIIIIMIIII II Illllllll IIIIIM llllll 

orf 26-1 FGGTCGVLAWLCTLGT I KTAD Y P KA VWQG AKSM FGA I A I L I LAWL I S T WGEMHTGD YL 

310 320 330 340 350 360 



15 



370 380 390 400 410 420 

orf 26a. pep STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 

Illllllllllll 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 h M M 1 1 M I M 

orf 26-1 STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 



20 



430 440 450 460 470 480 

orf 26a . pep VMAGAVCGDHCS PI SDTT I LSSTGARCNH I DHVTSQLP YALTVAAAAASGYLALGLTKS A 

1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M III 1 1 1 1 i M 1 1 1 1 1 1 1 II 1 1. 1 1 1 M 1 1 1 ! 1 1 

orf 26-1 VMAGAVCGDHCS PI SDTT I LSSTGARCNH I DHVTSQLP YALTVAAAAASGYLALGLTKS A 

430 440 450 460 470 480 



25 



490 500 
orf 26a . pep LLGFGXTG I VLAVL I FLLKDKKRANAX 

I I I I I : I I M I M I I I I I I I I I I I I I 
orf 26- 1 LLGFGTTGIVLAVLI FLLKDKKRANAX 

490 500 



Homology with a predicted ORF from N. gonorrhoeae 

ORF26 (SEQ ID NO: 690) shows 94.8% and 99% identity in 97 and 206 aa overlap at the N- 
terminus and C-terminus, respectively, with a predicted ORF (ORF26ng) (SEQ ID NO: 696) from 
N. gonorrhoeae: 



30 



35 



orf 26 . pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILXXVAFLVGGNPVDGLTHLKDMV 60 

I I I I I I I I I I I III I I I I I I I ■ I I I I I I I I I I I I IIIIIM IIIIIMIIII 

orf26ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 60 

orf 26 .pep VGLAWSDXDWS LGKPKI LVFX I LLG I FTS LLTYSGSN 97 

Illlhl IIIIIIMIIII 1 1 1 1 1 1 1 1 1 1 

orf26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 12 0 

// 



40 



orf 26 .pep 
orf 26ng 
orf 26 .pep 
orf 26ng 



TSLVFGGTCGVFAWLCTLGTIKTADYPKA 

M 1 1 1 1 1 1 1 1 Ml 1 1 II M M I Mi 1 1 1 

ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 



326 



326 



386 



VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 

I M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 M 1 1 

VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 3 86 
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orf26 .pep ATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 446 

1 1 1 M ! 1 1 1 1 1 1 II I II 1 1 1 1 1 1 M < II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 , 1 1 1 1 II 1 1 

orf26ng ATGTSWGTFGIMLPI AAAMAVKVEPALI I PCMSAVMAGAVCGDHCS P I SDTTI LSSTGAR 446 

orf 26 .pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 

5 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 Ml 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 I : I M 

orf26ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 506 

The complete length ORF26ng nucleotide sequence (SEQ ID NO: 695) is: 

1 ATGCAGCTGA TTGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

10 51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TTTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGGCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGCATTTT CACTTCACTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

15 301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGTGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGCC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCTCGC CCATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

20 551 GATTGCTCGT TACCTACAAA ATTACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCG CTGTTTGCCC TGATTATGGT 

651 ATTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGAtg gCGCGTTTCG 

701 AACAGGCTGC GTTGAACGAA gcccaggacg aaaccgccgc tTCAGACgCT 

751 ACCAAAGGTC GTGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

25 801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAATACCG ACGTAAACAG TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGTTCGGCAC 

951 GATTAAAACC GCCGATTATC CCAAAGCCGT GTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CCTGGCTCAT CAGTACGGTT 

30 1051 GTCGGCGAAA TGCACACGGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTAtCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGTTCGCCCA 

35 1301 TCTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTATGCC CTGACGGTTG CCGCCGCCGC 

1401 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC CGGTATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCGACGTTTG A 

40 

This encodes a protein having amino acid sequence (SEQ ID NO: 696): 

1 MQLIDYSHSF FSWPPFLAL ALAVITRRVL LSLGIGILVG VAFLVGGNPV 

51 DGLTHLKDMV VGLAWADGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRH I KN R CGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

45 151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SWGTFGIMLP 

50 401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 

501 KKRADV* 
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ORF26ng (SEQ ID NO: 696) and ORF26-1 (SEQ ID NO: 692) show 98.4% identity in 505 aa 
overlap: 

10 20 30 40 50 60 

orf 26 - 1 . pep MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

Mllllllll IMIMMIMIII IIMIIMMII III Mllllll IIMIIMIIMI 

orf26ng MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 26- 1 . pep VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 M 1 1 II i 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 Mllllll 

orf26ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 26-1 .pep LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I = I I I I I I I I I I I I I I I I I 

orf 26ng LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTAS PMCVLMPVSSWGASIIA 

130 140 150 160 1 170 180 



20 



190 200 210 220 230 240 

orf 26 - 1 . pep ' TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFVVAWFSFDIGSMARFEQAALNE 

lllllllllllll IIIIIIIIMIIIIIIMIIIIMIIIIIIIIMIIMIII MINI 

orf26ng TLAGLLVT YKI TEYTPMGTFVAMS LMNY YALFAL IMVFWAWFS FD I GSMARFEQAALNE 

190 200 210 220 230 240 



25 



250 260 270 280 290 300 

orf 26 - 1 . pep AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

I: II Ihl I MM 1 1 II 1 1 II 1 1 1 IMMII 1 1 1 : 1 1 1 1 II Ml 1 1 1 1 1 1 1 II 1 1 II II 

orf 26ng AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 



30 



310 320 330 340 350 360 

orf 26- 1 . pep FGGTCGVLAWLCTLGTIKTADYPKAVWQGAKSMFGAIAILIIjAWLISTWGEMHTGDYL 

MMMI MMMMIMI MMIMMMI IIMMIMM! MIMIIIMM 

orf26ng FGGTCGVLAWLCTFGTIKTADYPKAVWQGAKSMFGAIAILILxAWLISTWGEMHTGDYL 

310 320 330 340 350 . 360 



35 



370 380 390 400 410 420 

orf26-l .pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

M 1 1 M I 1 1 1 1 1 1 1 1 M I M 1 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II M 

orf26ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI IPCMSA 

370 380 390 400 410 420 



40 



430 440 450 460 470 480 

orf 26- 1 . pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M I 

or f 2 6 ng VMAGAVCGDHCS P I SDTT ILSS TGARCNH I DHVTS QL P YALTVAAAAASG YLALGLTKS A 

430 440 450 460 470 480 



45 



490 500 
orf 26-1 .pep LLGFGTTGI VLAVLI FLLKDKKRANAX 

IMIIIIIMIIMMIM MIM: 

orf2 6ng LLGFGTTGI VLAVLI FLLKDKKRADVX 

490 500 
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In addition, ORF26 ng (SEQ ID NO: 696) shows significant homology to a hypothetical 
HJnfluenzae protein(SEQ ID NO: 1 1 56): 

sp|P44263 |YF86_HAEIN HYPOTHETICAL PROTEIN HI1586 ) gi | 1074850 | pir | | C64037 
.hypothetical 

5 protein HI1586 - Haemophilus influenzae (strain Rd KW20) )gi| 1574427 (U32832) H. 

influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 





Query : 


1 


MQL I D YS HS F FS WP P FLALALAV I TRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 


60 


10 






M+LID+S S +S+VP LA+ LA+ TRR L +L V 






Sbjct : 


14 


MELIDFSSSVWS I VPALLAI ILAI ATRRVLVSLSAGI I IGSLMLSDWQIGSAFNYLVKNV 


73 




Query : 


61 


VGLAWADGDWSLGKPKILVFLILLGI FTSLLTYSGSNQAFADWAKRHI KNRCGAKMLTAC 


120 








V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 






Sbjct : 


74 


VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 


132 


15 


Query : 


121 


LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 


180 








LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA I I 






Sbjct : 


133 


LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 


192 




Query : 


181 


TLAGLLVT YK I TE YT PMGT FVAMS LMNY YAL FAL I MVFWAW FS FD I GSMARFEQAALNE 


240 








/-ITT TV TTDVPDiP T7"T T7\MC MM iVA_i_TTj__i_T MWC \7B j_ T?C X?T\ T CM T? T?J- A T . 

+ KjLiLt II lifcjYIr'+o r V AImo IVL1N + 1 A+r + + ±1*1 vr V/i+rorUJ. ol v i K Hjt aJj 




20 


Sbjct : 


193 


LIGGLLATYSITEYTPIGAFVAMSSMNFYAIFSIIMVFFVAYFSFDIASMVRHEKLALKN 


252 




Query : 


241 


AQDETAASDATKGRVYAL 1 1 P VLAL I AS TVS AM I YTG AQ A SETFS ILGAFENTDVN 


296 








+D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 






Sbjct: 


253 


TEDQLE EETGTKGQVRNL I L P I LVL I IATVSMM I YTGAEALAADGKVFS VLGTFENTWG 


312 




Query: 


297 


TSLVFGGTCGVL- -AWLCTFGTIKTADYPKAVWQGAKSMFGXXXXXXXXXXXSTWGEM 


354 


25 






TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 






Sbjct : 


313 


TSLWGGFCSI I ISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 


372 




Query: 


355 


HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALI 


414 








TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 






Sbjct : 


373 


QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 


432 


30 


Query: 


415 


I P CMS AVMAGAVCGDHCS P I SDTT I LS S TGARCNH I DHVT SQXXXXXXXXXXXXXXXXXX 


474 








+ P C + S AVMAG AVCGDHCS P + SDTT I LS S TG A+ CNH I DHVT + Q 






Sbjct: 


433 


LPCLS AVMAGAVCGDHCS P.VSDTT I LSSTGAKCNHIDHVTTQLPYAATVAT ATS I GY I W 


492 




Query: 


475 


XXXKSALLGFGTTGIVLAVLIFLLKDK 501 










S L GF T + L V+IF +K + 




35 


Sbjct: 


493 


GFTYSGLAGFAATAVSLIVI IFAVKKR 519 





Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 697): 
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1 . . AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT . GTTTATCAGG ATGACAAGTT 

5 201 GGTCAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 698; ORF27): 

1 . . KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

10 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 699): 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

15 151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

20 4 01 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGT AT CAAG A CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTCTC GGAACTGTCT 

601 ATCCGCCACC ATCAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

25 651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 700; ORF27-1): 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

30 51 VAGIAHA QDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG S I KTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

35 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF27 (SEQ ID NO: 698) shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) 
(SEQ ID NO: 702) from strain A of N. meningitidis: 

10 20 30 

40 orf 27 .pep KQWYADXS IKTEMVMVNDEPAKILTWDESG 

MINI M I I i M M I I I I I I I I I I I I 
or f 2 7a LSEGTGXRYYRNGGKESE I QFKQNKANGVWKQWYADGN IKTEMVMVNDEPAKILTWDESG 

140 150 160 170 180 190 

40 50 60 70 80 

45 orf 27 .pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 

MINIM MINIMI MM I MMMMNIIM MMMM 
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orf 27a RLLSELSIHHHXRNGVVLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORF27a nucleotide sequence (SEQ ID NO: 701) is: 

5 1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT ATATCAGGCA ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

10 251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

3 01 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 
351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

4 01 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 
4 51 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

15 501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GATTACTCTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 



20 



This encodes a protein having amino acid sequence (SEQ ID NO: 702): 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHAQXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

25 151 EIQFKQNKAN GVWKQWYADG N I KTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGW LEWYEDGSKK XEAVYQDDKL VRKTQWDXDG YLIEP* 

ORF27a (SEQ ID NO: 702) and ORF27-1 (SEQ ID NO: 700) show 94.7% identity in 245 aa 
overlap: 

30 10 20 30 40 50 60 

orf 27a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I I ! 1 1 1 1 1 1 Ml IIIIIIMIIh MINI I 

orf 27-1 MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

10 20 30 40 50 60 

35 70 80 90 100 110 120 

orf 27a . pep XYPSMKKYSEPYIVASTOlKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

M 1 1 M 1 1 1 1 M I II 1 1 1 1 1 1 1 1 1 : 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 27-1 YYPSMKKYS E PY I VASTQ I KS FVPTLQNGML I LWHFNGQKKMAGGFS KGKPDGEWVNWYP 

70 80 90 100 110 120 

40 130 140 150 160 170 180 

orf 2 7a . pep NGKKSAVMPY KNGLSEGTGXRYYRNGGKESE I QFKQNKANGVWKQWYADGN I KTEMVMVN 

Illlllllllil MIMI 1 1 1 M . 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 h I II 1 1 1 1 

or f 2 7 - 1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGS I KTEMVMVN 

130 140 150 160 170 180 

45 190 200 210 220 230 240 

orf 27a . pep DEPAK I LTWDESGRLLS ELS I HHHXRNGWLEWYEDGSKKXEAVYQDDKL VRKTQWDXDG 

MM MIMI MIMI Ml IMIIIIIMIIM 1 1 1 II 1 1 1 1 1 1 1 1 1 II II 

orf 27-1 DEPAKI LTWDESGRLLS ELS I RHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 
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orf27a.pep YLIEPX 
MINI 

orf27-l YLIEPX 

5 Homology with a predicted ORF from N.zonorrhoeae 

ORF27 (SEQ ID NO: 698) shows 96.3% identity over 82 aa overlap with a predicted ORF 
(ORF27ng) (SEQ ID NO: 704) from N. gonorrhoeae: 

orf 27 .pep KQWYADXS I KTEMVMVNDEPAKI LTWDESG 30 

MINI IIIIIIIIIIIIIIIIIIIIIII 
10 orf 27ng LSEGTGYRYYRNGGKESE I QFKQNKANGVWKQWYADGS I KTEMVMVNDEPAKI LTWDESG 193 

orf 27 . pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

MllllllllhlllllllllllMIIII MMMMMMIMMIMM 

orf 2 7ng RLLSELS I RHHKRNGWLEWYEDGS KKS EAVYQDDKLVRKTQWDKDGYL I E P 24 5 

1 5 The complete length ORF27ng nucleotide sequence (SEQ ID NO: 703) is: 

1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

20 201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

3 51 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

25 451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

30 701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 704): 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

35 101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng (SEQ ID NO: 704) and ORF27-1 (SEQ ID NO: 700) show 98.8% identity in 245 aa 
40 overlap: 



10 20 30 40 50 60 

orf 2 7-1 .pep MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSWAGIAHAQDF 

I II II II III II I I II II II II I I I M II I I I II I I I II I I M I II I M.M I I II II I I 
orf2 7ng MKKLSRIVFSIVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVAAGIAHAQDF 

45 10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 27-1. pep YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M [ 1 1 ! t 1 1 1 1 1 1 i I 

orf 27ng YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
5 70 80 90 100 110 120 

130 140 150 160 170 180 

orf27-l .pep NGKK3AVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGWKQWYADGSIKTEMV^fVN 

I I I I I I I I I I I I I 11 I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf27ng NGKXSAVMPYKNGLSEGTGYRYYRNGGKESE I QFKQNKANGVWKQWYADGS I KTEMVMVN 

10 130 140 150 160 170 180 

190 200 210 220 230 240 

orf 2 7 - 1 . pep DE PAKI LTWDESGRLLS ELS I RHHQRNGVVLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
I i I I I MM] : I I I I I I I I I I I i h I I I I I I I I I I II I II I I I I I I I I I I I I II I I I I I I 
orf27ng DE PAKI LTWDESGRLLS ELS I RHHKRNGWLEWYEDGS KKSEAVYQDDKLVRKTQWDKDG 

15 190 200 210 220 230 240 

orf 27-1. pep YLIEPX 
MINI 

orf27ng YLIEPX 

20 Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (SEQ ID NO: 700) (24.5kDa) was cloned in pET and pGex vectors and expressed in 
E.coli, as described above. The products of protein expression and purification were analyzed by 
25 SDS-PAGE. Figure 17A shows the results of affinity purification of the GST-fusion protein, and 
Figure 17B shows the results of expression of the His-fusion in E.colL Purified GST-fusion protein 
was used to immunise mice, whose sera were used for ELISA, which gave a positive result, 
confirming that ORF27-1 (SEQ ID NO: 700) is a surface-exposed protein and a useful immunogen. 

Example 84 

30 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 705): 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

35 201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

3 01 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 
351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

4 01 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 
40 4 51 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 
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501 AGTCGGGCTT GGTGATG 

This corresponds to the amino acid sequence (SEQ ID NO: 706; ORF47): 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTI FWL AARIAAFIPG 

101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 707): 



10 1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

15 251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

^401 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

20 501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 

551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 

651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

25 751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

30 1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

35 This corresponds to the amino acid sequence (SEQ ID NO: 708; ORF47-1 ): 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTI FWL AARIAAFIPG 

101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

40 201 PKWVAQASLW LPMLTAMLMA HGVLAWLSAV FAFAAGVIFT VQVYRWWYKP 

251 VLKEPMLWIL FAGYLFTGLG LIAVGASYFK PAFLNLGVHL IGVGGIGVLT 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

45 Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 
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ORF47 (SEQ ID NO: 706) shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) 
(SEQ ID NO: 710) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 4 7 . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 

5 I I I I I I I II I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I 

orf 4 7a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEM IWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47 .pep IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 

10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 E 1 1 1 1 1 [ 1 1 1 E 

orf 4 7a IAFLLTAVATWTGQPPTRGGVLVGLTI FWLAARIAAFI PGWGASASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 

orf 4 7 . pep MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 

15 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 li M 1 1 1 1 i 1 1 1 1 1 i 1 1 1 1 1 1 1 1 M 

orf 4 7a MALPVIRSQNQRN YVAVFALFVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 

130 140 150 160 170 180 



20 



orf 4 7a GTRII SFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 

190 200 210 220 230 240 

The complete length ORF47a nucleotide sequence (SEQ ID NO: 709) is: 



1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

25 151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

3 01 TGGGGTGCGT CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

3 51 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 
30 4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 

4 51 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 
501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 
551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 
6 01 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 

35 651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 

751 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 

851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

40 901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

45 1151 GTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 710): 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 
51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 
50 101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRI I SFFTS KRLNVPQIPS 
201 PKWVAQASLW LPMLTAMLMA HGVMPWLSAA FAFAAGVIFT VQVYRWWYKP 
251 VLKEPMLWIL FAGYLFTGLG LIAVGASYFK PAFLNLGVHL IGVGGIGVLT 
301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 
5 351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a (SEQ ID NO: 710) and ORF47-1 (SEQ ID NO: 708) show 99.2% identity in 384 aa 
overlap: 

10 20 30 40 50 60 

1 0 orf 4 7a . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

1 1 1 M 1 1 1 1 I M M 1 1 1 1 1 1 M : 1 1 1 1 1 M 1 1 1 1 1 M 1 1 M 1 1 1 1 M I 1 1 1 1 1 1 1 1 

orf 4 7-1 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

15 orf 4 7a. pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

1 1 1 1 1 i 1 1 1 1 ■ 1 1 1 1 1 M 1 1 1 1 1 II M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 

orf 4 7-1 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

20 orf 4 7a . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I M I M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 II 1 1 1 1 1 1 1 

or f 4 7 - 1 MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

190 200 210 220 230 240 

25 orf 47a .pep GTRI I S FFTS KRLNVPQ I PS PKWVAQAS LWLPMLTAMLMAHGVMPWLS AAFAFAAGV I FT 

II I 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II h MM: IMIIIII! 

or f 4 7-1 GTRI IS FFTS KRLNVPQ I PS PKWVAQAS LWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

30 or f 4 7a . pep VQVYRWWYKP VLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 

II MIMMIMI MMMMIMM IMIM MUM MMMIIMMIM 

orf 4 7-1 VQ VYRWW Y KP VLKEPMLWIL FAG YLFTGLGL I AVGASYFKPAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

35 orf 47a .pep LGMMARTALGHTGNP I YPPPKAVPVAFWLMMAATAVRMVAVFS SGTAYTHS I RTS SVLFA 

MIMMMMMIIMMMMIIMMMIIMI MIMI MMIMIMI Ml 

or f 4 7 - 1 LGMMARTALGHTGNP I YPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHS I RTS SVLFA 

310 320 330 340 350 360 

370 380 
40 orf 4 7a .pep LALLVYAWKY I PWL I RPRSDGRPGX 

IMMMMMMI MIMI I 
orf 47-1 LALLVYAWKY I PWL I RPRSDGRPGX 

370 380 . 



Homology with a predicted ORF from N. gonorrhoeae 



45 ORF47 (SEQ ID NO: 706) shows 97.1% identity over 172 aa overlap with a predicted ORF 
(ORF47ng) (SEQ ID NO: 712) from N. gonorrhoeae: 



CHIR-0160 (356.001) 



-514- 



PATENT 



ORF4 7 



MKFTKHPWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 



60 



ORF4 7ng 




60 



ORF4 7 



I AFLLTAVATWTGQP PTRGGVLVGLT I FWLAAR I AAF I PGWGAS AS G I LGTLFFWYGAVC 120 



ORF4 7ng 




ORF4 7ng 



ORF4 7 




The ORF47ng nucleotide sequence (SEQ ID NO: 71 1) is predicted to encode a protein comprising 
amino acid sequence (SEQ ID NO: 712): 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTAFWL AARIAAFIPG 

101 WGAAASGILG TLFFWYGAVC MALPVIRSQN RRNYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VQV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 



The predicted leader peptide and transmembrane domains are identical (except for an He/Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396 (SEQ ID NO: 1157), accession 
number e246540): 

TM segments in ORF47ng 



INTEGRAL 


Likelihood = 


-5, 


.63 


Transmembrane 


52 


- 68 


INTEGRAL 


Likelihood = 


-3 , 


.88 


Transmembrane 


169 


- 185 


INTEGRAL 


Likelihood = 


-3 , 


.08 


Transmembrane 


82 


- 98 


INTEGRAL 


Likelihood = 


-1. 


.91 


Transmembrane 


134 


- 150 


INTEGRAL 


Likelihood = 


-1, 


.44 


Transmembrane 


107 


- 123 


INTEGRAL 


Likelihood = 


-1, 


.38 


Transmembrane 


227 


- 243 



Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 713): 



1 ATGAAATTTA CCAAACATCC CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCACTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGAC AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 

251 GCTTGACCGC CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

301 TGGGGTGCGG CGGCAAGCGG CATACTCGGT ACGCTGTTTT TCTGGTACGG 

351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TtcgCAAAAC CGGCGCAACT 

4 01 ATGtcgCCGT ATTCGCAATA TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 

4 51 CACGtCcAgc tGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 

501 GTCGGGCCTG GTTATGGTGT CGGGCTTTAT CGGCCTGATT GGGATGAGGA 

551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ACGTGCCGCA GATTCCCAGT 

601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTACCCATGC TGACCGCCAT 

651 ACTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 

701 CGGCGGGCGT GATTTTTACC GTACAGGTGT ACCGCTGGTG GTATAAACCC 

751 GTATTGAAAG AACCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 
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801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 

851 TCAATCTGGG CGTACATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 

901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATTCGATTTA 

951 TCCGCCGCCC . AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATCCGCA CGTCTTCGGT TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 

1101 GTGGAAATAC ATTCCGTGGC TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 



10 This encodes a protein having amino acid sequence (SEQ ID NO: 714; ORF47ng-l): 



1 MKFTKHPWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTAFWL AARIAAFIPG 

101 WGAAASGILG TLFFWYGAVC MALPVIRSQN RRNYVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

15 201 PKWVAQASLW LPMLTAILMA HGVMPWLSAA FAFAAGVIFT VQVYRWWYKP 

251 VLKEPMLWIL FAGYLFTGLG LIAVGASYFK PAFLNLGVHL IGVGGIGVLT 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYAWKY IPWLIRPRSD GRPG* 



20 ORF47ng-l (SEQ ID NO: 714) and ORF47-1 (SEQ ID NO: 708) show 97.4% identity in 384 aa 
overlap: 



10 20 30 40 50 60 

orf 4 7 - 1 . pep MKFTKHPWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

M 1 1 1 1 ' 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II M 

25 or f 4 7ng- 1 MKFTKHPWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 4 7-1 .pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

1 1 1 ] 1 1 ! I i 1 1 1 1 1 1 1 1 1 : 1 1 ! ! 1 1 ! M i I 1 1 1 1 1 1 1 1 1 hi M 1 1 1 1 1 1 1 1 1 1 1 1 1 

30 orf 4 7ng- 1 I AFLLTAVATWTGQPPTRGGVLVGLTAFWLAAR I AAF I PGWGAAASG I LGTLFFWYGAVC 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 4 7 - 1 . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I I I I I I I I : I I I I I I I I M I I I I I I I I I , II I I I I I I I I M : I I I I I I I I I I I I I I 
35 orf4 7ng-l MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 47-1 .pep GTRI I SFFTS KRLNVPQIPS PKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVI FT 

I MUM IIMMMMMIMMIMM IIMMII MIM Mil: 1 1 II II I Ml 

40 orf4 7ng-l GMRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAILMAHGVMPWLSAAFAFAAGVIFT 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf47-l .pep VQVYRWWYKP VLKEPMLW I LFAGYLFTGLGL I AVGAS YFKPAFLNLGVHL IGVGG I GVLT 

IIIIIIIIIMIIMI.MIMIIIIIiniMIIIMMI IMIIIIIII I III 

45 or f 4 7ng- 1 VQVYRWWYKPVLKEPMLWI LFAGYLFTGLGL I AVGAS YFKPAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 47-1 .pep LGMMARTALGHTGNPIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 

I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
50 orf4 7ng-l LGMMARTALGHTGNSIYPPPKAVPVAFWLMMAATAVRMVAVFSSGTAYTHSIRTSSVLFA 
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310 320 330 340 350 360 

370 380 
orf 47-1 .pep LALLVYAWKY I PWL I RPRSDGRPGX 
I I I II I I I I I I II h ! I I I I I M I 
5 orf47ng-l LALLVYAWKY I PWL I RPRSDGRPGX 

370 380 

Furthermore, ORF47ng-l (SEQ ID NO: 714) shows significant homology to an ORF (SEQ ED NO: 
1 157) from Pseudomonas stutzeri: 



10 gnl | PID|e246540 (Z73914) 0RF396 protein [Pseudomonas stutzeri] Length = 396 

Score = 155 bits (389) , Expect = 5e-37 

Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 





Query : 


7 












P+W +AFRPF+ +LY L++-LW +TG GF WH HEM++G+A + 




15 


Sbjct : 


14 


PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP- -GFQPTGGWLAWHRHEMLFGFAMAI 


71 




Query: 


60 


VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 


119 








V FLLTAV TWTGQ G LVGL A WLAAR+ + + G AA L LF 






Sbjct: 


72 


VAGFLLTAVQTWTGQTAPSGNRLVGLAA WLAARL - GWLFGLPAAWLAPLDLLFLVALVW 


130 




Query: 


120 


CMALPVIRSQNRRNYVAVFAIFVLGGTHAAFXXXXXXXXXXXXXXXXXXXXXMVSGFIGL 


179 


20 






MA + + +RNY V + ++ G +V+ + L 






Sbjct: 


131 


MMAQMLWAVRQKRNYP I WVLSLMLGADVLI LTC 


190 




Query: 


180 


IGMRI I S FFTS KRLNVPQ I PS P - KWVAQAS LWLPMLTA I LMAHGV MPWLSAAFAFA 


234 








IG R+I FFT + L P W+ A L + A+L A GV PL FA 






Sbjct: 


191 


IGGRVIPFFTQRGLGKVDAVKPWVWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 


249 


25 


Query: 


235 


AG V I FT VQ VYRWW YKP VLKE PMLW I L F AG YL FTGLGL I AVG AS Y F - KP AFXXXXXXXXXX 


293 








GV +++ RW+ K + K +LW L L+ + + +F A 






Sbjct: 


250 


IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLWAAFGLALWHFGLLAQSSPSLHALSV 


309 




Query: 


294 


XXXXXXXXXMMARTALGHTGNS I YPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHS IR 


353 








M+AR LGHTG + P+AFL FS + 




30 


Sbjct: 


310 


GSMSGLILANIARVTLGHTGRPLQLPAGI IG- AFVL FNLGTAARVFLSVAWPVGGLW 


365 




Query : 


354 


TS S VL FALALLVYAWKY I PWL I RPRSDGRPG 3 84 










++V + LA +Y W+Y P L+ R DG PG 






Sb j c t : 


366 


LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 3 96 





Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
35 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 85 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 715): 



1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 
51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 
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101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

5 301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

4 01 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

4 51 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

10 551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence (SEQ ID NO: 716; ORF67): 

1 . .MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF v VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

15 101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFVWVY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from N. gonorrhoeae 

20 ORF67 (SEQ ID NO: 716) shows 51.8% identity over 199 aa overlap with a predicted ORF 
(ORF67ng) (SEQ ID NO: 7 1 8) from N. gonorrhoeae: 

or f 6 7 . pep MPSEGSDGXGXGEXEXVAHAQXDFVGFEAG 3 0 

Illlllll I II I I I II I lllllll 
orf 67ng TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGFVGFEAG 14 6 

25 90 100 110 120 130 140 

orf 67 .pep VFQAS PVWTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 90 

i I I ! 1 I j I I : - : I I INI:: ::: || 111 = 11 I 
orf 6 7ng VFQASPVWAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVSI 2 06 

orf 67 .pep XWXXXXSRGFXXHRMNLMFNVSVGDARAD IGFEF I VEFE I VNGGQAERRNGVEAAVSLMF 150 

30 : | : |:: : : | | | | | | | : | | I II I : I I I I I I I I I I I I I I I I I I II III 

orf 6 7ng TRVGGKSTCYFFSRIDAVSDVSVGDARTD IGFEFWEFE I VNGGQAERRNGVECAVFLMF 266 

orf 67 .pep CLGFFW WYLFSNFFSRRITFF-PFSVTGI ICRYSPAAEI 190 

I II : : I : I : : I : IT lllll : I I I I : 

orf6 7ng RLLVFYVKLVAAKSFI ILS FQLFYVHGIFI WPFPVTGI IRGDAPAAEWADRHPGVDGM 326 

35 

The ORF67ng nucleotide sequence (SEQ ID NO: 717) is predicted to encode a protein comprising 
amino acid sequence (SEQ ID NO: 718): 

1 MPSETVGSIV NVGVDESVGF SPPFPSIQHF YRFHRIHRIR LFRPPGPMQL 

51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI PACAGMTNFE IAVLSGMTVR 

40 101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF VGFEAGVFQA 

151 SPVWAVAGV QGQAGRDVYA HARHRAEAQ A AAAVAFLIGV FLRMSV RINR 

201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG DARTD IGFEF WEFEIVNGG 

251 QAERRNGVEC AVFLMFRLLV FYVKLVAAKS FIILSFQLFY VHGIFIWPF 

301 PVTGI IRGDA PAAEWADRH PGVDGMRTDV SEI IAYRAYF VFAWSGWFRI 
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351 IVGNAFGGVG * 



Based on the presence of a several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 86 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 719) 



1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

2 01 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

2 51 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 
301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

3 51 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

4 01 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCA. . . 



This corresponds to the amino acid sequence (SEQ ID NO: 720; ORF78): 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 721): 



1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

2 01 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

2 51 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

3 01 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 
351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

4 01 ACTTGCGTTT TATCATTATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 
4 51 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 
501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 
551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 
601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 
651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 



This corresponds to the amino acid sequence (SEQ ID NO: 722; ORF78-1): 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNWVLF VARFLPGLRT AVFVTAGISR KVSYLRFIIM DGLAALISVP 

151 IWIYLGEYGA HNIDWLMAKM HSLQ SGIFVI LGIGATWAW IW WKKRQRIQ 

201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG 
51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT 
101 FDKYGNWVLF VARFLPGLRT AVFVTAGISR KVSYLRFIIM 



VISGMGYTNP 
PXRYEQVQEK 
DGLAA . . . 
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Computer analysis of this amino acid sequence predicts several transmembrane domains, and also 
gave the following results: 

Homology with the dedA homologue of H.influenz.ae (accession number P45280) (SEP ID NO: 
1158) 

5 ORF78 (SEQ ID NO: 720) and the dedA homologue (SEQ ID NO: 1 158) show 58% aa identity in 
144aa overlap: 



Orf78: 


4 


FLEAFFVEYGYAAVFFVLVI CGFGVP I PEDLTLVTGGVI SGM - - GYTNPH IMFAVGMLGV 


61 






FL FF EYGY AV FVL+ I CGFGVP I PED+TLV+GGVI +G+ N H+M V M+GV 




DedA: 


20 


FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 


79 


Orf78: 


62 


LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 


121 






L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 




DedA: 


80 


LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 


139 


Orf 78 : 


122 


VFVTAGISRKVSYLRFI IMDGLAA 145 








+++ +GI+R+VSY+RF+++D AA 




DedA: 


140 


I YMVSG I TRRVS YVRFVL I D FCAA 163 





Homology with a predicted ORF from N. meningitidis (strain A) 



ORF78 (SEQ ID NO: 720) shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) 
(SEQ ID NO: 724) from strain A of N. meningitidis: 



10 20 30 40 50 60 

MFAFLEAFFVEYG YAAVFFVLVI CGFGVP I PEDLTLVTGGVI SGMGYTN PH I MFAVGMLG 

1 1 1 :| I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M ! 1 1 1 i 1 1 1 1 , 1 1 1 1 1 M 1 1 1 II I I 

MFALLEAFFVEYG YAAVFFVLVI CGFGVP I PEDLTLVTGGVI SGMGYTNPH I MFAVGMLG 
10 20 30 40 50 60 

70 80 90 100 110 120 

VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGN WVLFVARFLPGLRT 

III IIIIIMIIIIi I I Ml I I I I II I HUM IIIIIUIMIMI 
VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGN WVLFVARFLPGLRT 
70 80 90 100 110 120 

130 140 
AVFV TAGISRKVSYL RFI IMDGLAA 

IMM 1 1 1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 

AVFV TAGISRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDVJLMAKMHSLQ SGIFIA 
130 140 150 160 170 180 

35 The complete length ORF78a nucleotide sequence (SEQ ID NO: 723) is: 

1 ATGTTTGCCC TTTTGGAAGC CTTTTTTGTC GAATACGGCT ATGCGGCCGT 
51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 
101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 



20 orf 7 8. pep 

orf78a • 

25 orf 78. pep 

orf 78a 

30 orf 78. pep 

' orf78a 
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151 CATATTATGT TTGCAGTCGG. TATGCTCGGC GTATTGGTCG GGGACGGCAT 

2 01 CATGTTCGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCACAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTGTTATTT GTCGCTCGTT TCCTGCCCGG 

5 351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTTG CCGCGCTGAT TTCCGTGCCC 

4 51 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCCGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

10 601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 724): 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

15 51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNWVLF VARFLPGLRT AVFVTAGISR KVSYLRFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

201 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

20 ORF78a (SEQ ID NO: 724) and ORF78-1 (SEQ ID NO: 722) show 89.0% identity in 227 aa 
overlap: 

10 20 30 40 50 60 

orf 78a . pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

i 1 1 : 1 M II 1 1 1 1 1 M 1 1 1 M 1 1 1 1 U 1 1 1 1 II M 1 1 1 1 1 1 1 M 1 1 II II I i 1 1 1 1 1 M 

25 orf 78 - 1 MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78a . pep VLVGDGIMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

I I I I I I I I I I I II I I I II I Ml I I I I II I! I I I I I I I I I I I I I I I I II I , I M I I I I 
30 orf 78- 1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78a . pep AVFVTAG I SRKVS YLRFL IMDGLAAL I S VPVW I YLGEYGAHNIDWLMAKMHS LQSGI F I A 
I I I I I I I ' I I I I Ml I h I I I I I I I I I I I hi I h I I I I I I I I I I I I I M I I , I I I I h 
35 orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 

190 200 210 220 

or f 7 8a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKi<AAQKQQX 
I: |:::||:||:||:: | : | | : : | : | | | | : | | | ||||||||::|| 
40 orf78-l LG I GATWAW I WWKKRQRI QF YRS KLKEKRAQRKAAKAAKKAAQS KQX 

190 200 210 220 



Homology w i th a predicted ORF from N. gonorrhoeae 



ORF78 (SEQ ID NO: 720) shows 97.4% identity over 38 aa overlap with a predicted ORF 
(ORF78ng) (SEQ ID NO: 726) from N. gonorrhoeae: 
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orf 78 .pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

I I I I I I i li I I I M I III i I I I I I I I I I I 
orf78ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf 78. pep IIMDGLAA 145 
:|IMIII 

orf 78ng LIMDGIAALISVPWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence (SEQ ID NO: 725) is predicted to encode a protein comprising 
amino acid sequence (SEQ ID NO: 726): 

1 . . YPVLFVARFL PGLRTAVFVT AGISRKVSYL RFLIMDGLAA LISVPVWIYL 
51 GEYGAHNIDW LMAKMHSLQ S GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 
101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence (SEQ ID NO: 727): 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 

3 51 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

401 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATTATCAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 

This corresponds to the amino acid sequence (SEQ ID NO: 728; ORF78ng-l): 



1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNWVLF VARFLPGLRT AVFVTAGISR KVSYLRFLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW FW WRKRRHYQ 

2 01 LYRAQLSEKR AKRKAE KAAK KAAQKQQ* 

ORF78ng-l (SEQ ID NO: 728) and ORF78-1 (SEQ ID NO: 722) show 88.1% identity in 227 aa 
overlap: 



10 20 30 40 50 60 

orf 78- 1 . pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

llhlllllMIIIIIMIIIIIIIMIIIIIMIIIIIIIIMIIIIIMIilMIIII 

orf 78ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1 .pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

I :| I M 1 1 1 1 1 1 1 1 1 1 1 Ml ! 1 1 1 1 1 M Mill I IIMIIIIIIIMII 

orf 78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 78-1 .pep AVFVTAGI SRKVS YLRFI IMDGLAAL I SVP I WI YLGEYGAHNIDWLMAKMHSLQSGI FVI 

I II I II II II MM 1 1 hi II MM II II hi II II II II II III II II II II II I Ih 

orf 78ng-l AVFVTAGI SRKVS YLRFLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQSGI FIA 

130 140 150 160 170 180 

190 200 210 220 

orf 78-1 .pep LG I GAT WAV? I WWKKRQR I Q F YRS KLKEKRAQRKAAKAAKKAAQS KQX 

Ih h::|hlhlh: hlhMMIIhlll MlllllhMI 
orf 7 8ng- 1 LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l (SEQ ID NO: 728) shows homology to the dedA protein (SEQ ID NO: 
1 158) from H.influenzae: 

sp|P45280|YG29_HAEIN HYPOTHETICAL PROTEIN HI1629 ) gi | 1073983 | pir | | D64 133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW20) 
)gi | 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect = 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 

LEAFFVEYGYAAVFFVLVI CGFGVPI PEDLTLVTGGVI SGM - -GYTNPHIMFAVGMLGVL 62 
L FF EYGY AV FVL+ I CGFGVP I PED+TLV+GGVI +G+ N H+M V M+GVL 

LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 
AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

FVTAGISRKVSYLRFL IMDGLAAL I SVP VWI YLGEYGAHNIDWLMAKMHSLQSGI FIALG 182 
++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 



Query: 


5 


Sbjct : 


21 


Query: 


63 


Sbjct: 


81 


Query: 


123 


Sbjct : 


141 


Query : 


183 


Sbjct : 


201 



Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 87 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 729): 



1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 
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3 51 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C... 

This corresponds to the amino acid sequence (SEQ ID NO: 730; ORF79): 

5 1 MKKLLAAVMM AGLAGAV SAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 

51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 731): 



10 1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

15 251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGTCATCAC 

451" CACGGCGAAG CGCATCAGCA CTAA 

20 

This corresponds to the amino acid sequence (SEQ ID NO: 732; ORF79-1): 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKIGG AFMKIHNDEA 

51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 

101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 

25 151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

30 ORF79 (SEQ ID NO: 730) shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) 
(SEQ ID NO: 734) from strain A of N. meningitidis: 

10 20 30 40 50 60 

or f 7 9 . pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
II I II I I I I I I ■ I I I I I I M : I I I I I II I I I I I I h M I I I M I I I I I M I I I I I I 
35 orf79a MKXLLAAVMMAGLAGA VSAAG I HVEDGWARTTVEGMKMGGAFM KI HNDEAKQD FLLGGS S 

10 20 , 30 40 50 60 

70 80 90 100 110 120 

orf 79 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M Mill Mill 

40 orf 79a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

70 80 90 100 110 120 



orf 79 .pep 



130 140 
VTLKFKNAKAQTVQLEVKIAPMPAMNH 
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or f 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 

\ 

5 The complete length ORF79a nucleotide sequence (SEQ ID NO: 733) is: 



1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAATCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

10 201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

3 01 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

15 4 51 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 734): 



1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
20 101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 

151 HGEAHQH* 

ORF79a (SEQ ID NO: 734) and ORF79-1 (SEQ ID NO: 732) show 94.9% identity in 157 aa. 
overlap: 



25 10 20 30 40 50 60 

or f 7 9a . pep MKXLI^VMMAGI^GAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

II M 1 1 1 1 1 1 Ml 1 1 II M I U 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 7 9 - 1 MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 

30 70 80 90 100 110 120 

orf 79a . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

! 1 1 1 1 1 1 1 1 !!. 1 1 1 1 M > 1 1 1 1 1 1 II 1 1 1 1 Mill I 1 1 1 1 1 1 1 1 lllll Mill 

or f 7 9 - 1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS YHVMFMGLKKQLKEGDKI P 

70 80 90 100 110 120 

35 130 140 150 

or f 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

Illlllllllllllllll III Ihllllllllllll 
orf 79-1 VTLKFKNAKAQTVQLEVKI APMPAMNHGHHHGEAHQHX 

130 140 150 

40 Homology with a predicted ORF from N. gonorrhoeae 



ORF79 (SEQ ID NO: 730) shows 96.1% identity over 76 aa overlap with a predicted ORF 
(ORF79ng) (SEQ ID NO: 736) from N. gonorrhoeae: 
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orf 79 .pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

II I I I I I I I I M :| I M I I I I I I I I I I I 
or f 7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 3 0 

orf 79 .pep YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKIAPMPAMNH 14 7 

5 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 i 1 1 1 1 1 1 1 1 M III 1 1 1 1 

orf 7 9ng YHVMFMGLKKQLKEGDKI PVTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQH 8 6 

An ORF79ng nucleotide sequence (SEQ ID NO: 735) was predicted to encode a protein 
comprising amino acid sequence (SEQ ID NO: 736): 

10 1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 

51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence (SEQ ID NO: 737): 

1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

15 51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

2 01 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 

2 51 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

20 3 01 AGCTATCACG TGATGTTTAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

4 51 CACGGCGAAG CGCATCAGCA CTAA 

25 This corresponds to the amino acid sequence (SEQ ID NO: 738; ORF79ng-l): 

1 MKKLLAAVMM AGLAGA VSAA GVHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 
151 HGEAHQH* 

30 

ORF79ng-l (SEQ ID NO: 738) and ORF79-1 (SEQ ID NO: 732) show 95.5% identity in 157 aa 
overlap: 

10 20 30 40 50 60 

orf 79 - 1 . pep MKKIjLAAVMMAGLAGAVSAAGVHVE 

35 II IMIIIIIIIMIIIII MM'IIIIMIMIMMMIIIII II hi I II 

orf 7 9ng-l MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHITOEA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 9-1. pep PVADRVEVHTH I NDNGVMRMREVEGGVPLEAKSVTELKPGS YHVMFMGLKKQLKEGDKI P 

40 | | | | | | | | | | | | | | | | M | | | | | : | | | | M | | | | | M | | | | | | | | | M | | | | I | | I I I I I 

orf79ng-l PVADRVEVHTH I NDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

130 140 150 

orf 79-1. pep VTLKFKNAKAQTVQLEVKI APMPAMNHGHHHGEAHQHX 

45 1 1 1 1 1 1 1 i M 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 79ng- 1 VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 
130 140 150 
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Furthermore, ORF79ng-l (SEQ ID NO: 738) shows significant homology to a protein (SEQ ED 
NO: 1 1 59) from Aquifex aeolicus: 

gi | 2983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 
Score = 63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEV11THINDNGVMRMREV 83 

V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIWEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 

+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
Sbjct: 87 ER-IEIPPKGKVEFKHHGYHVMIIGLKKRIKEGDKVKVELIFEKSGKITVEAPV 139 

Based on this analysis, it is predicted that the proteins from N. meningitidis and TV. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (SEQ ID NO: 732) (15.6kDa) was cloned in the pET vector and expressed in Exoli, as 
described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 18A shows the results of affinity purification of the His-fusion protein. Purified 
His-fusion protein was used to immunise mice, whose sera were used for ELISA (positive result) 
and FACS analysis (Figure 18B) These experiments confirm that ORF79-1 (SEQ ID NO: 732) is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 88 

The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 739): 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

2 01 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

2 51 CCAACGTATT GGGTCGGCAG -ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

3 01 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

3 51 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 
4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 
501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 
551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 
601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 
651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 
701 AA 
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This corresponds to the amino acid sequence (SEQ ID NO: 740; ORF98): 

1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWWSYIV SASDQLVNLL 

51 PKQWRPQYVL G FN I PGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

5 101 RIPWKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDLPVKTLAX PMPSEKADLP EQQ* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 741): 

10 1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

15 251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

3 01 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 
351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 
401 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 
20 501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 



25 



This corresponds to the amino acid sequence (SEQ ID NO: 742; ORF98-1): 



1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL G FN I PG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

30 151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with a predicted ORF from A '.meningitidis (strain A) 

35 ORF98 (SEQ ID NO: 740) shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) 
(SEQ ID NO: 744) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

II llill llllllllllll IIMIIIIIIII lillllllll IMIIIIMMII 

40 orf 98a MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 . 20 30 40 50 60 

70 80 90 100 110 120 

orf 98 .pep GFNI PGLGVI VA I AVLFVTGLFAANVLGRQ I LAAWDS LLGR I P WKS I YS S VKKVS E YVL 

I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I I 

45 orf 98a GFNI PGLGVI VAI AVLFVTGLFAANVLGRQ I LAAWDS LLGRIP WKS I YSSVKKVSXSLL 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf 98 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 

II III I III I II I II I 1 1 1 1 III 1 1 MINI 1 1 MM 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 98a SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
5 130 140 150 160 170 180 

190 200 210 220 230 

orf 98 . pep IMVKKSDVRELDMS VDEXLKYV I S LGMVI PDDLPVKTLAXPM PS EKADLPEQQX 

lllilll MIMM M 1 1 1 1 1 1 1 1 n ■ II 1 1 1 1 1 1 1 1 II II I i I i 1 1 1 

orf 98a IMVKKSDVRELDMS VDEALKYVIS LGMVI PDDLPVKTLAGPMPS EKADLPEQQX 

10 190 200 210 220 230 

The complete length ORF98a nucleotide sequence (SEQ ID NO: 743) is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

15 101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

20 351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT TAAAACACCA GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

25 601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence (SEQ ID NO: 744): 

30 1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 
101 RIPWKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 
151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 
201 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

35 

ORF98a (SEQ ID NO: 744) and ORF98-1 (SEQ ID NO: 742) show 98.7% identity in 233 aa 
overlap: 

10 20 30 40 50 60 

orf 98a . pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

40 I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 98 - 1 MTEXAAEGGKAAKALKKYL I TG I LVWLP I AVTVWWSY I VS AS DQLVNLL PKQWRPQYVL 

10 20 30 40 50 60 

70 80 . 90 100 110 120 

orf 98a . pep GFN I PGLGVIVAIAVLFVTGLFAANVLGRQ I LAAWDSLLGRIPWKS I YS SVKKVSXSLL 

45 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1| 1 1 1 1 1 || 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 || 1 1 1 1 1 1 M I 1 1 1 

orf 98-1 GFNI PGLGV I VA I AVLF VTGL F AANVLGRQ I LAAWDS LLGR I PWKS I YSSVKKVSESLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 9 8a. pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

50 1 1 II II II 1 1 II I M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 II 1 1 1 1 1 1 
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orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98a . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

5 1 1 1 ! 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 ! 1 1 i I 

orf 98-1 IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

Homology with a predicted ORF from N. gonorrhoeae 

ORF98 (SEQ ID NO: 740) shows 95.3% identity over a 233 aa overlap with a predicted ORF 
10 (ORF98ng) (SEQ ID NO: 746) from N. gonorrhoeae: 

10 20 . 30 40 50 60 

orf 98 .pep MTVTAAEGGKAAKALKRYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

II M I M 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 II M 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 n 1 1 1 1 1 1 1 

orf 98ng . MTEPAAEGGKAAKALKKYLITGILWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 
15 orf 98 .pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 120 

MIIIMIilll.lillllMIMIIIIIIMIIII.il 1 1 1 1 1 1 1 i I 1 : 1 1 1 1 I 

orf 98ng GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPWKSIYSSVKKVSESLL 120 

orf 98 .pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

lllllllllllllllll IIIIIIIIIIIIIIIIIIIII Mill MINI I INI III I 

20 orf 98ng SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

orf 98 .pep IMVKKSDVRELDMSVDEXLKYVI SLGMVI PDDLPVKTLAXPMPSEKADLPEQQ 233 

II 1 1 II 1 1 II II I LIN MMIMMMMMMMM III llhlllll 

orf 9 8ng IMVKKSDVRELDMSVDEALKYVISLGMVI PDDLPVKTLAGPMPPEKAELPEQQ 233 

25 The complete length ORF98ng nucleotide sequence (SEQ ID NO: 745) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 746): 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLX 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

30 151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 747): 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

35 51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

40 3 01 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

4 51 TCGAATGGGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 
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551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 
601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 
651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 
701 AA 

This corresponds to the amino acid sequence (SEQ ID NO: 748; ORF98ng-l): 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFN1PG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPWKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

ORF98ng-l (SEQ ID NO: 748) and ORF98-1 (SEQ ID NO: 742) show 97.9% identity in 233 aa 
overlap: 



15 



20 



25 



30 



35 



10 20 30 40 50 60 

orf 98- 1 . pep MTEXAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

Ml I I I II T I I I I I I II I I I I I I I I I I I I I I i M I I I I I I I M I I I I I I I I I I I 
orf 98ng- 1 MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98-1 .pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

lllllllll llllllllllll IIIIIIIIIMI IIIIIIIIIIIIIMMIMIII 
orf 98ng-l GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 

70 80 90- 100 110 120 

130 140 150 160 170 180 

orf 98- 1 . pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I I I I I I I I I I I 
orf 98ng-l SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 

130 140 150 160 170 180 

190 200 210 220 230 

orf 98- 1 . pep IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

I M I II 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 : 1 1 M II llhllllll 

orf 9 8ng - 1 IMVKKSDVRELDMSVDEALKYVI SLGMVI PDDLPVKTLAGPMPPEKAELPEQQX 

190 200 210 220 230 

Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



40 Example 89 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 749): 



1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 
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51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG G^GgTACTCA 

201 ATATCCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

301 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 

351 GjygAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

401 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AATCGCCAAA 

451 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 

501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 

551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 

601 GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 

651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

701 GGGCATATCC GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 

751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 

801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

851 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 

901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 

951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC . . . 

This corresponds to the amino acid sequence (SEQ ID NO: 750; ORF100): 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR TLALMLXAHA AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

2 01 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQLA DAADAAALKT 
251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPELLEA 

3 01 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 
351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH . . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 751): 



1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 

201 TATCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

3 01 GAAAAGGCGG AACTAGAAGC CTCACGCGTG TTGGTCAACA AAGAGGCCGG 
351 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCC GCCGGACAGA 

4 01 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 
451 CCGGAAAAAC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 
501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 
551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 
601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAACTTTC 
651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 
701 CATACCGCCG CCAGCTGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 
751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 
801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 
851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 
901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 
951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 
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1101 TTTGGTTCTA GCAAAGGTTT TCGACGAAAT CGGAGAACCG CAGAAGGCGG 
1151 AGGCGCAGCG CAACTTGGTT TTGGAAGCCG TCTCCGATGA CGAACGTCAC 
1201 GCAGCGTTAG AGCAGCATAG CTGA 

5 This corresponds to the amino acid sequence (SEQ ID NO: 752; ORF100-1): 



1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 
51 AVWWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
10 2 01 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQLA DAADAAALKT 

2 51 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVSDDERH 
4 01 AALEQHS* 

15 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF100 (SEQ ID NO: 750) shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) 
(SEQ ID NO: 754) from strain A of N. meningitidis: 

20 10 20 30 40 50 60 

orf 100 . pep MKT WW I WL FAAAVGLALAS G I YTGD VY I VLGQTMLR I NLHAF VLGS L I AVWWY FL F K 

llllllllllllll MINIM MMMMMMIMMM MMM MIMM 

orf 100a MKTWWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 

10 20 30 40 50 60 

25 70 80 90 100 110 120 

orf 100 . pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 

MIMM MMMMMMI I M M M M M M M M M M M M M 1 1 : Ml 

orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

30 130 140 150 160 170 180 

orf 100. pep TLALMLXAHAAGQMENIXXRDRYLAE I AKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

MMM MMMMM M M M M M M M M M M M M M M M M M M M M I 

or f 1 0 0a TLALMLGAHAAGQMENIELRDRYLAE IAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

35 190 200 210 220 230 240 

orf 100 .pep AAAKMNANLTRLVRLX I RYAFDRGDALQVLAKTEKLSKAG ALGKS EMERYQNWA YRRQLA 

MMMMMMMI MMMMMMMMMI MMI MMMMMMMMI 

orf 100a AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXS KAGAXGKS EMERYQNWAYRRQLX 

190 200 210 220 230 240 

40 250 260 270 280 290 300 

or f 1 0 0 . pep DAADAAALKTCLKRI PDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 

MMMMMMIMMMMMMMMMMMMMMMMMMI MMMMI 

or f 1 0 0a DAADAAALKTCLKRI PDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 



45 
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310 320 330 340 350 360 

or f 100. pep FVES VRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEAS I AL 

I 1 1 1 1 i I M II 1 1 1 1 1 II I M I M I M 1 1 MlllhllllllllllllllMII 

or f 1 00a FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEAS IAL 

5 310 320 330 340 350 360 

370 380 
orf 100 .pep KPS I SARLVLTKVFDE I GEPQKAEAH 

Illllllllhlllll lllllllh 
or f 1 0 0a KPS I SARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 

10 370 380 390 400 

The complete length ORFlOOa nucleotide sequence (SEQ ID NO: 753) is: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CNNTCGGGCT 

51 GGCATTGGCG TCGGGCATTN ACACCGGCGA CGTGTATATC GTACTCGGAC 

15 101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTCAAA TTCATCATCG GCGTACTCAA 

201 TANCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC' TTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTTGAAGC CTCGCGCGTA TTGGGAAACA AAGAGGCGGG 

20 3 51 GGATAACCGG ACTTTGGCAT TGATGTTGGG CGCACATGCC GCCGGGCAGA 

4 01 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

4 51 CCGGAAAAGC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

25 601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAANTTTC 

651 CAAGGCGGGC GCGTNGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGCTGNCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

30 851 GGGTCAAACA GCATTATCCG CACAACCGCC GACCCGAACT TTTGGAAGCN 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAA CGCGATCAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAT GCGCTTCTGC 

1001 TGANGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

35 1101 TTTGGTTCTG GCAAAGGTTT TTGACGAAAC CGGAGAACCG CAGAAGGCGG 

■ 1151 AGGCGCAGCG CAACTTGGTT TTGGCAAGCG TTGCCGAGGA AAACCGNCCT 

1201 TCCGCCGAAA CCCATTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 754): 



40 1 MKTWWIWL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

2 01 FDRGDALQVL AKTEKXSKAG AXGKS EMERY QNWAYRRQLX DAADAAALKT 
45 2 51 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

3 51 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

4 01 SAETH* 

50 ORFlOOa (SEQ ID NO: 754) and ORF100-1 (SEQ ID NO: 752) show 95.1% identity in 406 aa 
overlap: 
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10 20 30 40 50 60 

orf 100a. pep MKTVW I WLFAAAXGLALASG IXTGDVY I VLGQTMLR INLHAFVLGS L I AWVWYFLFK 

llllllllllllll IIIMIII I I I I I I I II I I I I M I I I I M I I' II I I I I I II I i 
orf 100- 1 MKTVVWIWLFAAAVGIAIASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100a. pep FI IGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

lllllll 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 IIIMIII 

orf 100- 1 ' FI IGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

10 70 80 90 100 110 120 



130 140 150 160 170 180 

orf 100a . pep TLALMLGAHAAGQMENI ELRDRYLAE I AKLPEKQQLSRYLLLAES ALNRRDYEAAEANLH 

1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 M l i 1 1 1 1 1 1 1 1 1 1 M I [ 1 1 1 1 1 1 

orf 10 0 - 1 TLALMLGAHAAGQMENI ELRDRYLAE I AKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

15 130 140 150 160 170 180 



190 200 210 220 230 240 

orf 100a. pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

I II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 Ml 1 1 Mill 1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 

orf 100 - 1 AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLS KAGALGKS EMERYQNWA YRRQLA 

20 190 200 210 220 230 240 

250 260 270 280 290 300 

orf 100a. pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 ■ I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I 

or f 10 0 - 1 DAADAAALKTCLKRI PDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

25 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100a. pep FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEAS IAL 

IIIIIIIIMIIIIII I llllllllllll II 1 1 II II , 1 1 1 II 1 1 1 1 1 II 1 1 

orf 100-1 FVES VRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS IAL 

30 ' 310 320 330 340 350 360 

370 380 390 400 

orf 100a. pep KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

I 1 I I I I I 1 I I I I I I 1 I ■llllllllllll :h:::| M I I 
orf 100-1 KPSISARLVLAKVFDEIGEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

35 370 380 390 400 

Homology with a predicted ORF from N. gonorrhoeae 

ORF100 (SEQ ID NO: 750) shows 93.3% identity over a 386 aa overlap with a predicted ORF 
(ORFlOOng) (SEQ ID NO: 756) from N. gonorrhoeae: 



orf 100 .pep MKT WW I WL FAAA VGLALAS G I YTGD VY I VLGQTMLR I NLHAFVLGS L I A WVWY FL F K 60 

1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M II M 1 1 II II 1 1 1 1 

orf 100ng MKTWW I WLFAAAVGLALASG I YTGDVY I VLGQTMLR INLHAFVLGS L I AWVWYFLFK 60 

or f 1 0 0 . pep FI IGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 120 

I I I I I I I I I h h I II II I I I I II IIIMIII III IIIIIIMIII II : Ml 

or f 1 0 Ong FI IGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 
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orf 100 . pep TI^LMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

IIMII llllllllll MMMMMIMMMMMIIIMIMIMMMIMM 

or f lOOng TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

orf 1 0 0 . pep AAAKMNANLTRLVRLX I RYAFDRGDALQVLAKTEKLS KAGALGKS EMERYQNWAYRRQLA 24 0 

5 II I I I II I I I II II I ^ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

orf 1 0 Ong AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 24 0 

orf 100 .pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 3 00 

1 1 1 1 1 M M II M I h M 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 1 i 1 1 1 1 M 1 1 1 ! 1 1 I M 1 1 1 1 1 1 

orf 1 OOng DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 300 

10 orf 100 .pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 360 

I ! 1 1 1 1 1 1 - 1 I I I ^ II M II I I M I I II I I M I : I I M I I - II I I I I ! I I I 

orf 1 0 Ong FVES VRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS I AL 360 

orf 100 .pep KPS I SARLVLTKVFDEIGEPQKAEAH 386 

MM llllhlllll MMh 
] 5 orf 100ng KPS I PARLVLAKVFDETAQSQKAEAQRNLVLAS VAGENRPS AETR 405 

The complete length ORFlOOng nucleotide sequence (SEQ ID NO: 755) is: 

1 ATGAAAACGG TAGTCTGGAT TGTTGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

20 101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTTAAA TTCATCATCG GCGTACTCAA 

201 TATCCCCGAA AATATGCGGC GTTCCGGTTC GGCGCGGAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAATAAG GCGGGTTTGG CGTATTTCGA AGGGCGTTTT 

301 GAAAAGGCGG AACTCGAAGC CTCTCGAGTG TTGGGCAACA AAGAGGCCGG 

25 3 51 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCG GCAGGACAGA 

4 01 TGGAAAATAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

4 51 CCGGAAAAAC AGCAGCTTTC CCGCTATCTT CTGCTGGCGG AATCGGCGTT 

501 AAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCC 

30 601 TTCGATCGGG GCGATGCGTT GCAGGTTCTG GCAAAAaccG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGATGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGagcGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

35 851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATTCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGCCGGCTC GCCTACGGCC GCAAACTTTG GGGTAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG TATTGCACTG AAGCCGAGTA TTCCGGCGCG 

40 1101 TTTGGTGTTG GCAAAGGTTT TTGACGAAAC CGCACAGTCG CAAAAAGCCG 

1151 AAGCACAGCG CAACTTGGTT TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

12 01 TCCGCCGAAA CCCGTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 756): 

45 1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVWWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAE I AKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKS EMERY QNWAYRRQMA DAADAAALKT 

50 251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 
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4 01 SAETR* 



ORFlOOng (SEQ ID NO: 756) and ORF100-1 (SEQ ID NO: 752) show 95.3% identity in 402 aa 
overlap: 



10 



15 



20 



25 



30 



35 



40 



orf 100-1 .pep 
orf 100ng 



orf 100-1 .pep 



orf lOOng 



orf 100-1 .pep 
orf lOOng 



orf 100-1 .pep 
orf lOOng 



orf 100-1 .pep 
orf lOOng 

orf 100-1 .pep 
orf lOOng 

orf 100-1 .pep 
orf lOOn 



10 20 30 40 50 60 

MKTVVWIVVLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 

1 1 1 1 1 1 M 1 1 1 1 1 1 1 i II M 1 1 , 1 1 N II 1 1 1 1 1 1 1 i M I II I i I M 1 1 1 1 1 1 1 1 1 1 1 1 

MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
10 20 30 40 50 60 

70 80 90 100 110 120 

FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

IIMIIIIM: 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 , 1 1 1 1 1 1 1 1 1 1 1 1 . MINIM 

FI IGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TLALMLGAHAAGQMENI ELRDRYLAE I AKLPEKQQLSRYLLLAES ALNRRDYEAAEANLH 

130 140 150 160 170 180 

. 190 , 200 210 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I : I 
AAAP^ANLTRLTOLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 

190 200 210 220 230 240 

250 260 270 280 290 * 300 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
250 260 270 280 290 300 

310 320 330 340 350 360 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

I I I I I I I I I I I I 1 I I I I I I I I.: I E I I I ! I I I I I I E I I I I I I I I I I i I I I I I I I I I I I I I I 

FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 
310 320 330 340 350 360 

370 380 390 400 

KPS I SARLVLAKVFDE I GEPQKAEAQRNLVLEAVSDDERHAALEQHSX 

MM I I I I I I I I I I I : : II II II M II I = I : : : I : I 
KPS I PARLVLAKVFDETAQSQKAEAQRNLVLAS VAGENRPSAETRX 

370 380 390 400 



Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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The following DNA sequence, believed to be complete, was identified in N .meningitidis (SEQ ID 
NO: 757) 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

3 51 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 758; ORF102): 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE I PVLLM VAALYXWFK PF* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 759): 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC. 

2 51 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 
301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

3 51 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 



This corresponds to the amino acid sequence (SEQ ID NO: 760; ORF102-1): 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE I PVLLM VAALYLWFK P F* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with HP 1484 hypothetical integral membrane protein of H. pylori (accession number 
AE000647) (SEP ID NO: 1 160) 



ORF102 (SEQ ID NO: 758) and HP1484 (SEQ ID NO: 1160) show 33% aa identity in 143aa 
overlap: 

orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VISW A LFYLPR+FV A + V+ + +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK- -KLYSFIASPAM 65 
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orf 102 


63 


GAWFGAA I P FAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 


119 






G + + + GW+H KL L ++LLAY YC +R + + R+Y 




HP1484 


66 


GFTLITGILMLLIEPTLFKSGGWLHAKLALVVLLLAYHFYCKKCMRELEKDPTRRNARFY 


125 


orf 102 


120 


RVFNEIPXXXXXXXXXXXXFKPF 142 








RVFNE P KPF 




HP1484 


126 


RVFNE APT I LM I L I V I LVWKP F 14 8 





Homology with a predicted ORF from N. meningitidis (strain A) 



ORF102 (SEQ ID NO: 758) shows 99.3% identity over a 142aa overlap, with an ORF (ORF102a) 
(SEQ ID NO: 762) from strain A of N. meningitidis: 



10 20 30 40 50 60 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

III III IIMIIMMI IIIMIlllllMlillMMIIIIIIMIIIIMIIMIMI 

MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMID^ 

10 20 30 40 50 60 

70 80 90 100 110 120 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 M 1 1 Ml I M 

GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
70 80 90 100 110 120 

130 140 
VFNE I PVLLMVAALYXWFKPFX 

IIIIIIIIIIIIIM lllllll 
VFNEI PVLLMVAALYLWFKPFX 

130 140 

The complete length ORF102a nucleotide sequence (SEQ ID NO: 761) is: 



1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 762): 



1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAWFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102a (SEQ ID NO: 762) and ORF102-1 (SEQ ID NO: 760) show complete identity in 142 aa 
overlap: 



orf 102 .pep 
orf 102a 

orf 102 .pep 
orf 102a 

orf 102 . pep 
orf 102a 
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10 20 30 40 50 60 

orf 102a. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

1 1 1 1 1 1 1 1 1 II I M 1 1 1! 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 

orf 102-1 MMFSWFKIjFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102a . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M I 
or f 1 0 2 - 1 GFGAWFGAAI PFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102a .pep VFNEIPVLLMVAALYLWFKPFX 

1 1 1 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 

orfl02-l VFNEIPVLLMVAALYLWFKPFX 

130 140 

Homology with a predicted ORF from N. gonorrhoeae 

ORF102 (SEQ ED NO: 758) shows 97.9% identity over a 142 aa overlap with a predicted ORF 
(ORF102ng) (SEQ ID NO: 764) from N. gonorrhoeae: 



orf 102 .pep MMFSWFKLFHLFFV I SWFAGLFYLPR I FVNMAM I DVPRGNPEYVRLSGMAVRLYRFMS PL 60 

I I M I I I I I I I I I I M M I I M I I I I I I I I I I I M I I I I I I I II I I I I I I I I I II I M 
orf 102ng MMFSWFKLFHLFFV I SWFAGLFYLPR I FVNMAM I DAPRGNPEYVRLSGMAVRLYRFMS PL 60 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

IIIIIIIIIIIIIMI I Mill MINI IMIIIUil 1 1 1 1 MM II III Mill I 

orf 102ng GFGAWFGAAI PFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 12 0 

orf 102. pep VFNE I PVLLMVAALYX WFKPF 142 

I I I I I I I I I I I I I I IIIMI 
orfl02ng VFNE I P VLLMVAAL YLWFKPF 142 

The complete length ORF102ng nucleotide sequence (SEQ ID NO: 763) is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGCGCC GCGCGGCAAT CCCGAGTATG TGCGCCTGTC GGGGATGGCG 

151 GTGCGGTTGT ACCGTTTTAT GTCGCCTTTG GGTTTCGGCG CGGTCGTGTT 

' 201 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTATCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

3 51 CTGGTACCGC GTGTTCAAcg aAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence (SEQ ID NO: 764): 



i 

51 
101 



MMFSWFKLFH LFFVISWFAG LFYLPRI FVN MAMIDAPRGN 
VRLYRFMSPL GFGAWFGAA IPFAAGRWGS GWVHVKLCLG 
GVLLRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK 



PEYVRLSGMA 
LMLLAYQLYC 
PF* 
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ORF102ng (SEQ ID NO: 764) and ORF102-1 (SEQ ID NO: 760) show 98.6% identity in 142 aa 
overlap: 



10 



15 



20 



10 20 30 40 50 60 

orf 102-1. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

II I II I INI II I MM II II III Mil MM 1 1 hill MM I III II MINI III 1 1 

orf 102ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102 - 1 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

Illlllllllllllll MINI IIIMIMMIIIIMIIIIIIIIIII MINIM 

orf 102ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 

130 140 
orf 102-1 .pep VFNEIPVLLMVAALYLWFKPFX 

Illllllllllllllllllllll 
or f 1 0 2 ng VFNE I P VLLMVAAL YL WFKP FX 

130 140 

In addition, ORF102ng (SEQ ID NO: 764) shows significant homology to a membrane protein 
(SEQ ID NO: 1 160) from H.pylori: 



25 



30 



35 



gi | 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 14 8 
Score = 79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps = 13/147 (8%) 



Query: 


3 


Sbjct : 


8 


Query: 


63 


Sbjct : 


66 


Query: 


116 


Sbjct: 


122 



F W K FH+ VISW A LFYLPR+FV A 



V+ + 



+LY F+ + 



G + 



R+YRVFNE P 



F +G GW+H KL L ++LLAY YC +R + 



KPF 



Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 91 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 765): 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 
40 51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 
101 TTACGGAAAC GGTCAGGCGC GGC // 
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//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

4 01 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

4 51 CCGCGCCGAT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 766; ORF85): 



1 MAKMMKWAAV AAVAAAA VWG GWS . LKPEPH VLDITETVRR G 

51 

101 

151 

201 I SFTILSEPDT 

251 PIKAKLDSVD* PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence (SEQ ID NO: 767): 



1 . . GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 

51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 

101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 

151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 

251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 

3 01 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

3 51 GTCGGAATTG GGCTACACGC GCATTACGGC AACGATGGAC GGCACGGTGG 

4 01 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 
4 51 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 
501 GATTGCCGAG GGCGATATTA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 
551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 
601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 
•651 GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 
701 ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 
751 ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 
801 TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 
851 CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGTATGAA TACCGAAGTA 
901 AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 
951 CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 

1001 GATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 768; ORF85-1): 



1 . . VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR QAALWKENAT SKEDLESAQD AFAAAKANVA 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAILVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFTILSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAEREIRT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF85 (SEQ ID NO: 766) shows 87.8% identity over a 41 aa overlap and 99.3% identity over 
1 53aa overlap with an ORF (ORF85a) (SEQ ID NO: 770) from strain A of N. meningitidis: 



10 20 30 40 

MAKMMKWAAVAAVAAAAVWGGWS - LKPEPHVLDITETVRRG 

Illllllllllllllllllllll HIM- llllllll 

MAKMMKWAAVAAVAAAAVWGGWS YLKPEPQAAY I TETVRRGD I SRTVS ATGE I S PSNLVS 
10 20 30 40 50 60 

// 

80 90 100 
ISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

MM llllllll III llllllll Mill II 

T I VQLANLDMMLNKMQ I AEGD I TKVKAGQD I S FT I LSEPDTP I KAKLDS VDPGLTTMS SG 
210 220 230 240 250 260 

110 120 130 140 150 160 

GYNSS TDTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVL 1 1 PSLTVKNRGGK 

1 1 1 1 1 1 1 II 1 1 II I II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 h 

GYNSS TDTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVL 1 1 PS LTVKNRGGR 
270 280 290 300 310 320 

170 180 190 200 210 220 

AFVRVLGADGKAAERE I RTGMRDSMNTE VKS GLKEGDKW I S E I TAAEQQESGERALGGP 

I I I II I II I I II II I II I II I I I II I II I I II I II II I II I II I II II I I I II II I II I I 
AFVRVLGADGKAAERE I RTGMRDSMNTEVKSGLKEGDKWI SE I TAAEQQESGERALGGP 

330 340 350 360 370 380 

230 
PRRX 

MM 

PRRX 
390 

The complete length ORF85a nucleotide sequence (SEQ ID NO: 769) is: 



1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

201 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 



orf 85 .pep 
orf 85a 

orf 85 .pep 
orf 85a 

orf 85 .pep 
orf 85a 

orf 85 .pep 
orf 85a 

orf 85 .pep 
orf 85a 
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951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

5 1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 770): 

1 MAKMMKWAAV AAVAAAA VWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

10 101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATAK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 

15 3 51 RDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85a (SEQ ID NO: 770) and ORF85-1 (SEQ ID NO: 768) show 98.2% identity in 334 aa 
overlap: 



30 40 50 60 70 80 

20 orf 85a .pep PQAAY I TETVRRGD I SRTVS ATGE I S PSNLVS VGAQASGQ I KKLYVKLGQQVKKGDL I AE 

IIIIIIIIIMI I i I I I I I I I I I M I I I I 
orf 85- 1 VSVGAQASGQI KI LYVKLGQQVKKGDLI AE 

10 20 30 



90 100 110 120 130 140 

25 orf 85a . pep INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 

IM II IMIMIIIIIII IIIIIIIIIIMI MMMMIIMMMMMIMI 

orf 85-1 INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 



150 160 170 180* 190 200 

30 orf 85a . pep ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTVVAILVEEGQTVNAAQST 

M II I M I II I I I II I I I II II M I I II M I I I M I M I I I I I I I I I I M MM M I I 
orf 85-1 AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 

100 110 120 130 140 150 



210 220 230 240 250 260 

35 orf 85a. pep PT I VQLANLDMMLNKMQ I AEGD I TKVKAGQD I S FT I LSEPDTP I KAKLDS VDPGLTTMS S 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml M I 

orf 85-1 PT I VQLANLDMMLNKMQ I AEGD I TKVKAGQD I S FT I LS EPDTP I KAKLDS VDPGLTTMS S 

160 170 180 190 200 210 



270 280 290 300 310 320 

40 orf 85a . pep GGYNSSTDTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVL 1 1 PS LTVKNRGG 

MUM MMIMMIIMIII MIMMMMM MMMMIMIMM MM 

or f 8 5 - 1 GGYNSSTDTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVL 1 1 PS LTVKNRGG 

220 230 240 250 260 270 



330 340 350 360 370 380 

45 orf 85a . pep RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

Ml 1 1 II II 1 1 II 1 1 1 1 1 II II M 1 1 II II I Ml 1 1 Ml M 1 1 M Ml Ml M II 1 1 

or f 8 5 - 1 KAFVRVLGADGKAAERE I RTGMRDSMNTEVKSGLKEGDKW I SE I TAAEQQESGERALGG 

280 290 300 310 320 330 



390 
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orf85a.pep PPRRX 

Mill 

orf85-l PPRRX 

Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a (SEQ 
ID NO: 770). 

Homology with a predicted ORF from N. gonorrhoeae 

ORF85 (SEQ ID NO: 766) shows a high degree of identity with a predicted ORF (ORF85ng) (SEQ 
ID NO: 772) from N. gonorrhoeae: 



ORF85 1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

Illllllllllllllllllllll Mill:: Mhllll 
0RF85ng 1 MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 



ORF85 ISFTILSEPDT 250 

IMIMIMM 

ORF85ng 201 TVNAAQSTPT I VQLANLDMMLNKMQ I AEGD I TKVKAGQD I S FT I LS EPDT 250 

ORF85 2 51 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 3 00 

1 1 1 1 1 1 M M T 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 1 M I 

ORF85ng 251 PIKAKLDSVDPGLTTMSSGGYNSSTDTASNAVYYYARSFVPNPDGKLATG 3 00 

0RF85 301 MTTQNTVEIDGVKNVLI I PSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

i I I I I I I I I I I . I I I : I I I I [! I 1 I I I I I I I I : 1 I I I 1 I IIIIIMI 
ORF85ng 301 MTTQNTVE I DGVKNVLL I PSLTVKNRGGKAFVRVLGADGKAVERE I RTGM 350 

ORF85 152 RDSMNTEVKSGLKEGDKWI SE I TAAEQQESGERALGGPPRR 3 93 

MMMMMMMMMMMMMMMMMIMIMM 

ORF85ng 351 KDSMNTEVKSGLKEGDKWISE I TAAEQQESGERALGGPPRR 393 



The complete length ORF85ng nucleotide sequence (SEQ ID NO: 771) is: 



GCGGCGGTCG CGGCGGCaac 
CGAACCGCAG GCTGCTTATA 
GCCGGACGGT TTCCGCGACG 
GTCGGCGCGC AGGCTTCGGG 
GCAACAGGTC AAAAAGGGCG 
AGACCAACAC GATCGATATG 
AAGCTGGTGT CGGCACAGAT 
GCGTCAGGCG GCGTTGTGGA 
AAAGCGCGCA GGATGCGCTT 
AAGGCTTTAA TCAGACAGAG 
TTTGGGCTAC ACGCGCATTA 
TTCCCGTGGA AGAGGGGCAG 
ATTGTCCAAT TGGCGAATCT 
CGAGGGCGAT ATTACCAAGG 
TTTTGTCCGA ACCGGATACG 
CCCGGGCTGA CCACGATGTC 
GGCTTCCAAT GCGGTCTATT 
ACGGCAAACT CGCCACGGGG 
GGTGTGAAAA ATGTGTTGCT 



1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC 

3 01 GAAAAATCCA AATTGGAAAC GTATCAGGCG 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA 

401 AGGATGATGC GACCTCTAAA GAAGATTTGG 

451 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG 

651 GGATATGATG TTGAACAAAA TGCAGATTGC 

701 TGAAGGCGGG GCAGGATATT TCGTTTACGA 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG 

901 ATGACGACGC AGAATACGGT TGAAATCGAC 
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951 TATTCCGTCG CTGACCGTGA 

1001 TGTTGGGTGC GGACGGCAAG 

1051 AAAGACAGTA TGAATACCGA 

1101 AGTGGTCATC TCCGAAATAA 

5 1151 GCGCCCTAGG CGGCCCGCCG 



AAAATCGCGG CGGCAAGGCG TTCGTACGCG 
GCAGTGGAAC GCGAAATCCG GACCGGTATG 
AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 
CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 
CGCCGATAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 772): 



1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITEAV RR GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

10 101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD I TKVKAGQD I SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

15 351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng (SEQ ID NO: 772) and ORF85-1 (SEQ ID NO: 768) show 96.1% identity in 334 aa 
overlap: 

30 40 50 60 70 80 

20 or f 8 5ng PQAAY I TETVRRGD I SRTVSATGE I S PSNLVSVGAQASGQI KKLYVKLGQQVKKGDL I AE 

Illlllllllll lilllllllllllllll 
orf 85-1 VSVGAQASGQIKILYVKLGQQVKKGDLIAE 

10 20 30 



90 100 110 120 130 140 

25 orf 85ng INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 

Nihil!:: | | | | | | I | M I I I I I I I I ! I I I I I I I I I II I M - I I I I' I I I I I I I 
orf 85-1 INSTSQTNTLNTEKSKLETYQAKLVSAQI ALGSAEKKYKRQAALWKENATSKEDLESAQD 

40 50 60 70 80 90 



150 160 170 180 190 200 

30 orf 85ng ALAAAKANVAELKAL I RQS KI S INTAESDLGYTR I TATMDGTWAI PVEEGQTVNAAQST 

I : I I I I I I I I I II I I I M I I I I I I I I I h i ! I I I I I I I I I I I I M IIIIIIMIIMI 
orf 85-1 AFAAAKANVAEL KAL I RQS K I S I NT AES E LG YTR I TATMDGT WA I L VEEGQTVNAAQS T 

100 110 120 130 140 150 



210 220 230 240 250 260 

35 orf 85ng PTI VQLANLDMMLNKMQIAEGDI TKVKAGQD IS FT I LSEPDTP I KAKLDSVDPGLTTMSS 

I II Ml II II II 1 1 II 1 1 II II II II Ml MM II MINI II II II II MM I II 1 1 II 

orf 85-1 PT I VQLANLDMMLNKMQ I AEGD I TKVKAGQD I S FT I LS EPDTP I KAKLDSVDPGLTTMSS 

160 170 180 190 200 210 



270 280 290 300 310 320 

40 O r f 8 5 ng GGYNS S TDTASNAVY Y YARS FVPNPDGKLATGMTTQNTVE I DGVKNVLL IPS LTVKNRGG 

II I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M II 1 1 1 1 II 1 1 1 1 II I II M 1 1 M h M II II II 1 1 1 

orf 85-1 GGYNS STDTASNAVYYYARS FVPNPDGKLATGMTTQNTVE I DGVKNVLI I PSLTVKNRGG 

220 230 240 250 260 270 



330 340 350 360 370 380 

45 orf 85ng KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

I M 1 1 1 1 1 1 1 1 1 1: 1 1 1 1 1 1 1 h 1 1 1 1 I M I M M 1 1 1 1 1 1 1 1 1 1 1 : 1 I M 1 1 1 M I 

orf 85 - 1 KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

280 290 300 310 320 330 
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390 

orf 85ng PPRRX 
MM 

orf 85-1 PPRRX 

5 

In addition, ORF85ng (SEQ ID NO: 772) shows significant homology to an Kcoli membrane 
fusion protein (SEQ ID NO: 1161): 

gi | 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from membrane 
fusion protein precursor, MTRC_NEIGO SW: P43505 {412 aa) [Escherichia coli] Length 
10 = 380 

Score = 193 bits (485), Expect = 2e-48 

Identities = 120/345 (34%) Positives = 182/345 (51%), Gaps = 13/345 (3%) 





Query : 


29 


PQAAY I TETVRRGD I S RTVSATGE I S P SNLVS VGAQAS GQ I KKL YVKLGQQ VKKGDL I AE 


88 








P Y T VR GD+ ++V ATG+ + V VGAQ SGQ+K L V +G +VKK L+ 




15 


Sbjct : 


41 


PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 


100 




Query: 


89 


INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 


148 








1+ N I ++ L +A+ A+ L A Y RQ L + A S + + 






Sbjct: 


101 


IDPEQAENQ I KEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 


160 




Query: 


14 9 


XXXXXXXXXXXXXXXIRQSKISINTAESDLGYTRITATMDGTWAIPVEEGQTVNAAQST 


208 


20 






I + + + + S++TA+++L YTRI A M G V I +GQTV AAQ 






Sbjct : 


161 


EMAVKQAQ I GT I DAQ I KRNQAS LDTAKTNLD YTR I VAPMAGEVTQ I TTLQGQTV I AAQQA 


220 




Query : 


209 


PT I VQLANLDMMLNKMQ I AEGDI TKVKAGQDI S FT I LSEPDTPI KAKLDS VDPGLTTMSS 


268 








P 1+ LA+ + ML K Q++E D+ +K GQ ' FT+L +P T + ++ VP 






Sbjct: 


221 


PNILTLADMSAMLVKAQVSEADVIHLKPGQKAWFTVLGDPLTRYEGQIKDVLP 


273 


25 


Query: 


269 


GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 


328 








+ + ++A++YYAR VPNP+G L MT Q + + + VKNVL IP + + G 






Sbjct: 


274 


TPEKVNDAI FYYARFEVPNPNGLLRLDMTAQ VH IQLTDVKNVLT I PLS ALGDPVG 


328 




Query : 


329' 


KAFVRV- LGADGKAVERE IRTGMKDSMNTEVKSGLKEGDKWI SE 372 










+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 




30 


Sbjct: 


329 


DNRYKVKLLRNGETREREVTIGARNDTDVEIVKGLEAGDEWIGE 3 73 





Based on this analysis, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (SEQ ID NO: 768) (40.4kDa) was cloned in the pGex vectors and expressed in E.coli, as 
35 described above. The products of protein expression and purification were analyzed by SDS- 
PAGE. Figure 19A shows the results of affinity purification of the GST-fusion protein. Purified 
GST-fusion protein was used to immunise mice, whose sera were used for Western blot (Figure 
19B), FACS analysis (Figure 19C), and ELISA (positive result). These experiments confirm that 
ORF85-1 (SEQ ED NO: 768) is a surface-exposed protein, and that it is a useful immunogen. 



CHIR-0160 (356.001) 



-547- 



PATENT 



Example 92 

The following partial DNA sequence was identified in N. meningitidis (SEQ ED NO: 773): 

1 . .ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

5 101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 

251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

301 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 

10 351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

4 01 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 

4 51 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 

501 CTATACGCTG AAACTCAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 

551 CGTAA 



15 



This corresponds to the amino acid sequence (SEQ ID NO: 774; ORF120): 



1 . . IPATMTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 
51 GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 
101 LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 
20 151 SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 775): 

1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

25 101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

30 351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

4 01 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

35 601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 776; ORF120-1): 

1 MMKTFKNIFS AAILSAALPC AYAA GLPQSA VLHYSGSYGI PATMTFERSG 

40 51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE • WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

45 Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 
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ORF120 (SEQ ID NO: 774) shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) 
(SEQ ID NO: 778) from strain A of N. meningitidis: 

10 20 30 

orf 120 .pep IPATMTFERSGNAYKIVSTIKVPLYNIRFE 

5 I I I I : II I M I I I I I i II I ! I I I 

orf 120a SAA I LSAALPCAYAAGLPXS AVLHYSGS YGI PATXXXXXXXNAXKI VST I KVPLYNI RFE 

10 20 30 40 50 60 

40 50 60 70 80 90 

orf 12 0 .pep SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 

10 ' MINIMI! Ml Mil Mil I III II II II II Ml I II = MMMMMMMI 

orf 12 0a SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 - 80 90 100 110 120 

100 110 120 130 140 150 

orf 120 .pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 

15 1 1 1 1 1 1 II I M 1 1 1 1 1 1 II 1 1 1 II Ml M IN I ! I I II 1 1 1 1 1 Ml II 1 1 M 1 1 1 i 1 1 1 

orf 120a AANDAKLPPGLKI TNGKKLYSVGGLNKAGTGKYS IGGVETEWKYRVRRGDDAVMYFFAP 

130 140 150 160 170 180 

160 170 180 

orf 120 . pep S LNNI PAQ I GYTDDGKTYTLKLKS VQ INGQAAKPX 
20 | | | | | | | | | || | | || | || | | | | | | | | | | | | | | | | | 

orf 120a S LNNI PAQ I GYTDDGKTYTLKLKSVQ INGQAAKPX 

190 200 210 220 

The complete length ORF120a nucleotide sequence (SEQ ID NO: 777) is: 

25 1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

30 251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

3 01 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 

401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

35 501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 

40 This encodes a protein having amino acid sequence (SEQ ID NO: 778): 

1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNI PAQ I GY 

45 201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF120a (SEQ ID NO: 778) and ORF120-1 (SEQ ID NO: 776) show 93.3% identity in 223 aa 



overlap: 
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10 20 30 40 50 60 

orf 120a . pep MMKTFKNI FSAAI LS AALPCAYAAGLPXSAVLHYSGS YGI PATXXXXXXXNAXKI VST I K 

llllllll MM IIIIIIIIMIII MIIIIMIIMIII = II lllllll 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



10 



70 80 90 100 110 120 

orf 120a ; pep VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 

IMMIMMIMIIII MIIIIIIIIMMMMIII MIMI : MINI 

orf 12 0 - 1 VPLYNIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 

70 80 90 100 110 120 



15 



130 140 150 160 170 180 

orf 12 0a. pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

MMMMMMMMM MMMMMMMM IMMMMMMMMMIMM 

orf 120-1 DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

130 140 150 160 170 180 
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190 200 210 220 

orf 120a. pep DAVMYFFAPSLNN I P AQ I GYTDDGKTYTLKLKS VQ INGQAAKPX 

M M M M M M M M M M M M M I M I MM M M M M I 

orf 120- 1 DAVMYFFAPSLNN IPAQ I GYTDDGKTYTLKLKSVQ INGQAAKPX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF120 (SEQ ID NO: 774) shows 97.8% identity over 184 aa overlap with a predicted ORF 
(ORF1 20ng) (SEQ ID NO: 780) from N. gonorrhoeae: 



25 



30 



orf 120 .pep 
orf 120ng 
orf 120 .pep 
orf 120ng 



orf 120 .pep 



orf 120ng 



IPATMTFERSGNAYKIVSTIKVPLYNIRFE 30 

IIIIIIIMIIIIIIIIIIIIIIIIIIIM 

SAAILSAALPCAYAARLPQSAVLHYSGSYGI PATMTFERSGNAYKI VSTI KVPLYNIRFE 69 

SGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 

Ml MMMMMMMMMMMMMMMMMMMMMMIMI MM M 

SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 12 9 

AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

MMMMMMMMMMMM MM MMMMMMMMMMM Mill 

AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYS I GGVETE WKYR VRRGDDT VT Y F F AP 18 9 



35 



orf 120 . pep SLNN IPAQ I GYTDDGKTYTLKLKSVQ I NGQAAKP 184 

Ml MM MMMMMMMMMMMMI 

orfl20ng S LNN I PAQ I GYTDDGKTYTLKLKSVQ I NGQAAKP 223 

The complete length ORF120ng nucleotide sequence (SEQ ID NO: 779) is: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 



ATGATGAAGA 
CCTGCCGTGC 
ATTCCGGCAG 
AATGCTTACA 
TTTCGAATCC 
ATAAAGACAT 
GGCAGCGTAA 
CAAGGCTATG 



CTTTTAAAAA 
GCGTATGCGG 
CTACGGCATT 
AAATCGTTTC 
GGCGGTACGG 
ACGCAGGGGC 
CCTACGGCAA 
GATTTGTTCA 



, TATATTTTCC 
CAAGGCTACC 
CCCGCCACGA 
GACGATTAAA 
TTGTCGGCAA 
AAACTGTATG 
AGCGGGCGAG 
CGCTTGCCTG 



GCCGCCATTT 
CCAATCCGCC 
TGACATTTGA 
GTGCCGCTAT 
TACCCTGCAC 
CGGAAGCCAA 
AGCAAAACCG 
GCAGTTGGCG 



TGTCCGCCGC 
GTGCTGCACT 
ACGCAGCGGC 
ACAATATCCG 
CCTGCCTACT 
ATTCGCCGAC 
AGCAAAGCCC 
GCAAATGACG 
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401 
451 
501 
551 
601 
651 



CGAAACTCCC 
GTCGGCGGCC 
GGAAACCGAA 
CGTATTTCTT 
ACCGACGACG 
CGGACAGGCC 



CCCGGGTCTG 
TGAATAAGGC 
GTCGTCAAAT 
CGCACCGTCC 
GCAAAACCTA 
GCCAAACCGT 



AAAATCACCA 
GGGTACGGGA 
ATCGGGTGCG 
CTGAACAATA 
TACGCTGAAG 
AA 



ACGGCAAAAA 
AAATACAGCA 
GCGCGGCGAC 
TTCCGGCACA 
CTCAAATCGG 



ACTTTATTCC 
TaggCGGCGT 
GATACGGTAA 
AATCGGCTAT 
TGCAGATCAA 



This encodes a protein having amino acid sequence (SEQ ID NO: 780): 



1 MMKTFKNIFS AAILSAALPC AYAA RLPQSA VLHYSGSYGI PATMTFERSG 

10 51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

15 In comparison with ORF120-1 (SEQ ID NO: 776), ORF120ng (SEQ ID NO: 780) shows 97.8% 
identity in 223 aa overlap: 



20 



10 20 30 40 50 60 

orf 120-1 .pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

Illlllllllllllllllllllll I I I I I I I I M I II I I I I I II I II I I I I I I M I I 
orf 120ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 



25 



70 80 90 100 110 120 

or f 12 0 - 1 . pep VP L YN I RFESGGTWGNTLHPTYYRD I RRGKL YAEAKFADGS VT YGKAGES KTEQS P KAN 

Mill II II MM 1 1 Nihil M 1 1 II I II II Mil 1 1 Ml 1 1 II II 1 1 INI IN 

orf 120ng VPLYNI RFESGGTWGNTLHPA YYKD I RRGKL YAEAKFADGS VT YGKAGES KTEQS PKAM 

70 80 90 100 110 120 



30 



130 140 150 160 170 180 

orf 12 0-1. pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYS I GGVETE WKYRVRRGD 

1 1 M M I II 1 1 1 1 1 1 1 1 1 1 M M I II II I I II 1 1 II 1 1 1 1 1 1 1 1 M II 1 1 1 II I M I 

orf 12 0ng DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYS I GGVETE WKYRVRRGD 

130 140 150 160 170 180 
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190 200 210 220 

orf 120 - 1 . pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
hi I I I I I I I I I I II M I II I I I II I I I M I II I II I I II I I 
orf 120ng DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 



40 



This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 93 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 781): 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 
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51 . GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

2 01 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 
251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

3 01 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

3 51 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 
451 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence (SEQ ID NO: 782; ORF121): 



1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

15 151 RQGGNI . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 783): 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

20 151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

2 01 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 
251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

3 01 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 
351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

25 4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

30 651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

751 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

35 901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

40 This corresponds to the amino acid sequence (SEQ ID NO: 784; ORF121-1): 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNI VS S I GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

45 201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAGILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 

3 51 SFYRGR* 

50 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from Kmeninsitidis (strain A) 
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ORF121 (SEQ ID NO: 782) shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) 
(SEQ ID NO: 786) from strain A of N. meningitidis: 

10 20 4 30 40 50 60 

orf 121. pep M YRRKGRG I KPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

5 lllllllllll II I M 1 1 1 1 1 II 1 1 1 1 II I M Ml I M II 1 1 1 1 i M 1 1 1 1 M I 

orf 121a M YRRKGRG I KPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

10 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 I ; I I I I I I I I I I I I I I I I I I I I I M I I I i 

orf 121a ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 

orf 121 .pep E IDQAS I I AWLQAHTGELSNALKAWFPVLMRQGGNI 

15 lllllll llllllllllllllllllll!l III 

orf 121a EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



20 



or f 12 la SCGI AKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence (SEQ ID NO: 785) is: 



1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG ATGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

25 151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

30 401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

35 651 GATGCTGATT ATGGGTTTGG TTTACGGCTT GGGGTTGGTG CTGGTCGGGC 

701 TGGATTCGGG GTTTGCAATC GGTATGGTTG CCGGTATTTT GGTTTTTGTT 

751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG CTGGCAACCG TCGCCGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 

40 901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

45 This encodes a protein having amino acid sequence (SEQ ID NO: 786): 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

151 RQGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

50 201 GNLNEVLGEF LRGQLLVMLI MGLVYGLGLV LVGLDSGFAI GMVAG ILVFV 
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251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

5 ORF121a (SEQ ID NO: 786) and ORF121-1 (SEQ ID NO: 784) show99.2% identity in 356 aa 
overlap: 



10 20 30 40 50 60 

orf 121a . pep MYRRKGRG I KPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

MINI II III II I IIIIIIMIIMIIIMIIIIMMMIIII IIIMIIIU 

10 or f 1 2 1 - 1 MYRRKGRG I KPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 



15 



70 80 90 100 110 120 

orf 12 la. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

M : I M i II 1 1 1 1 M 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 .1 1 1 M 1 : 1 1 1 1 1 1 1 1 1 

orf 12 1-1 ASASMSVMVFSLILLLALLLI I VPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 . 100 110 120 



20 



130 140 150 160 170 180 

orf 12 la. pep E IDQAS I I AWLQAHTGELSNALKAWFPVLMRQGGNI VSS IGNLLLLPLLLYYFLLDWQRW 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12 1 - 1 E IDQAS I I AWLQAHTGELSNALKAWFPVLMRQGGNI VSS IGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 



25 



190 200 210 220 230 240 

orf 121a . pep S CG I AKLVPRRFAGA YTRI TGNLNE VLGE FLRGQLLVML IMGLVYGLGL VLVGLDSGFA I 

lllll II III MM 1 1 1 MM Mill I'll Mil 1 1 HUM MM 1 1 1 MM IIMI 

or f 12 1 - 1 SCGI AKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 



250 260 270 280 290 300 

GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

M 1 1 1 1 1 II 1 1 1 1 1 1 1 M 1 1 M I II M II II 1 1 I M 1 1 1 1 1 1 II 1 1 1 1 1 1 M 1 1 1 1 1 

GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
250 260 270 280 290 300 

310 320 330 340 350 

DRIGLSPFWV I FSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKY FAGS FYRGRX 

II II I M 1 1 II II II II II I II 1 1 M 1 1 1 III I i 1 1 1 1 1 1 1 II 1 1 1 II 1 1 1 M II 

DRIGLSPFWV I FSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKY FAGS FYRGRX 
310 320 330 340 350 

Homology with a predicted ORF from N. gonorrhoeae 

ORF121 (SEQ ID NO: 782) shows 97.4% identity over a 156 aa overlap with a predicted ORF 
(ORF121ng) (SEQ ID NO: 788) from N. gonorrhoeae: 

40 orf 121 .pep MYRRKGRG I KPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

llllllllllllllll I I M I I I I I : I I I I Mi I I I I I I I I I M I I I I I I 1 1 1 i 1 1 1 1 
orf 121ng MYRRKGRG I KPWMGAGAAFAALVWLVYALGDTLTP FAVAAVLAYVLDPLVEWLQKKGLNR 60 



orf 121a .pep 
30 orfl21-l 

orf 121a .pep 
35 orfl21-l 
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or f 121. pep AS ASMSVMVFS L I LLLALLL 1 1 VPMLVGQFNNLASRLPQL I GFMQNTLLPWLKNT I GGYV 120 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 111 I I I I I I 

orf 12 lng ASASMSVMVFS LI LLLALLL I I VPMLVGQFNNLASRLPQL I GFMQNTLLPWLKNT I GGYV 120 

orfl21.pep EIDQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI 156 

II II I lillMIM I I II Mlllll IN -I II I 

orf 12 lng EIDQAS I IAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 

An ORF121ng nucleotide sequence (SEQ ID NO: 787) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 788): 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

201 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

251 GGG* 



Further work revealed the following gonoccocal DNA sequence (SEQ ID NO: 789): 



1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AAACAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCCGCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

751 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 



This corresponds to the amino acid sequence (SEQ ID NO: 790; ORF121ng-l): 



1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTLTPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MGF VGMLAGL PLAAVTLVLL REGAQKYFAG 

351 SFYRGR* 
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ORF121ng-l (SEQ ID NO: 790) and ORF121-1 (SEQ ID NO: 784) show 97.5% identity in 356 aa 
overlap: 

10 20 30 40 50 60 

or f 12 1 - 1 . pep MYRRKGRG I KPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

5 1 1 i I M I II M 1 1 1 1 1 1 II 1 1 1 1 1 M II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 1 1 i 

or f 1 2 1 ng - 1 MYRRKGRG I KPWMGAGAAFAALVWLVYALGDTLTP FAVAAVLA YVLD PL VEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121-1 .pep AS ASMS VMVFS L I LLLALLL 1 I VPMLVGQFNNLASRLPQL I GFMQNTLLPWLKNT I GGYV 

10 I III II III II Mill II II II I II II III I! II I II II I II II MM II I! Ill II 

orf 121ng-l ASASMS VMVFS LI LLLALLL 1 1 VPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 12 1 - 1 . pep EIDQAS 1 I AWLQAHTGELSNALKAWFPVLMRQGGNI VSS IGNLLLLPLLLYYFLLDWQRW 

15 1 1 1 1 1 1 1 1 1! : I M 1 1 1 1 ! 1 1 M I M 1 1 1 : 1 1 1 1 1 1 1 1 ; I M I 1 1 1 ! M 1 1 1 1 1 1 M 

orf 121ng-l EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12 1 - 1 . pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
20 | | | M | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | : | I | | I I I I I I 

orf 12 ing- 1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 121-1 .pep GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
25 | | : | | | | | | | | | | | | | | | | | | | | | | || | | | | | | | | | | : | | | | | | | | | | | | | | | | | I I I I I 

orf 12 lng- 1 GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 12 1 - 1 . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 

30 | | | | | || | | | M I I I I I I = I I I II I I I I I I I I I I I I I I I I I I h I I I I I I I I I I I I I 

orf 121ng-l DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 

310 320 330 340 350 

In addition, ORF121ng-l (SEQ ID NO: 790) shows homology to a permease (SEQ ID NO: 1 162) 
3 5 from H. influenzae : 

Sp | P4 3 969 | PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score = 69.9 bits (168), Expect = 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 

Juery: 26 VYALGDTLT P FAVAAVLA YVLD PLVEWL-QKKGLNRAS ASMS VMVFS XXXXXXXXXXX VP 84 
40 +Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 



45 



Query : 


26 


Sbjct : 


32 


Query : 


85 


Sbjct: 


92 


Query : 


144 



ML Q +L S LP + N WL N YEID+++F+ 
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+ + + N+VS D G+++ +P+ A+ R + 

Sbjct: 148 SAVKLSLASII^LVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 

Query: 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 263 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQ I SNY I HGKLLE I L I VTL I TY I I FL I FGLNYPLLLAFAVGLS VLVP Y I GAVI VT I PVA 266 

Query: 264 XXXXXQFGSWNGILAWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ ++LP +1 S++ FG L GF 

Sbjct: 267 LVALFQFGISPTFWYI I IAFAVSQLLDGNLLVPYLFSEAVNLHPLI I I ISVLIFGGLWGF 326 

Query: 324 VGMLAGLPLAAVTLVLL 340 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAVI 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 94 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 791): 

1 . . ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC TCGTCATATT 

51 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

201 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

301 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

351 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

4 01 GCGGCGGTGT CGGGGAAATG GCTGCCGATA. TCGCCCAAAC CTGCCGCACC 

4 51 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG . . 

This corresponds to the amino acid sequence (SEQ ID NO: 792; ORF122): 

1 . .TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AAD I AQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 793): 



1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

2 01 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

2 51 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 
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301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 
351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 
401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 
451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 
5 501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 
601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 
651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 
701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 
10 751 CGTCATCGTT TGTGTTCCTG A 

This corresponds to the amino acid sequence (SEQ ID NO: 794; ORF122-1): 

1 ■ ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PIYSFSGTNS 
51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 
15 101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 
201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 
251 RHRLCS* 

20 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF122 (SEQ ID NO: 792) shows 94.0% identity over a 182aa overlap with an ORF (ORF 122a) 
(SEQ ID NO: 796) from strain A of N. meningitidis: 

10 20 30 

25 orf 122 .pep TAFSAALRLS PSXLVI FLS FGKPYQQTAAI 

Illllhlll I Ml lllllllllllll 
orf 122a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWI FLS FGKPYQQTAAI 

30 40 50 60 70 80 

40 50 60 70 80 90 

30 orf 122 . pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPE I AEFFVGFAFDVDARNVYAQ IGGDVGTHLR 

1 1 1 1 MINIM lllllllllllll IIMMIIII MIM lllllllllllll 

orf 122a LTFFXTSCPPRSNPYQQYRRLRLYAFHAPE I TEFFVGFAFXVDARNVYAQ IGGDVGTHLR 

90 100 110 120 130 140 

100 110 120 130 140 150 

35 orf 122 . pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

hill IMIIIIIIMIU I IMIIIIMMIMIIIIII IIIIIIIIMIIIM 

orf 122a NMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAATOIFELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 

160 170 180 

40 orf 122 .pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 

I I I I I I I I I I I I I I I I I I I I 'I I I I I I I I I 
orf 122a EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 
210 220 230 240 250 

45 The complete length ORF1 22a nucleotide sequence (SEQ ID NO: 795) is: 



1 ATATCATATT GGGCAAGCAG TTCACTGGAT TTTTTGGAAG TAGATACCGC 
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51 - GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAACC GGTACCGATG CCGATGTATT CGTTTTCGGG TACGAATTCG 

151 ACTGCNTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAACCGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

5 251 TTNNNACGTC CTGCCCGCCG CGTTCAAATC CTTACCAGCA ATACCGCCGC 

301 CTGCGACTCT ATGCCTTCCA TGCGCCCGAG ATAACCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GANGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATATGCGGC ' GCGAGTTTGG GTTTCTGTGC 

4 51 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

10 501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

15 751 CGTCATCGTT TGTGTTCCTG A 

This encodes a protein having amino acid sequence (SEQ ID NO: 796): 

1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 
51 T AF5AAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 
20 101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 
201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 
251 RHRLCS * 

25 ORF122a (SEQ ID NO: 796) and ORF122-1 (SEQ ID NO: 794) show 96.9% identity in 256 aa 
overlap: 

10 20 30 40 50 60 

orf 122a. pep ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 

I II MM I III 1 1 1 II II IMM II I II 1 1 MM Mill MM III III II MM 1 1 

30 orf 122-1 ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122a. pep SSCW I FLSFGKPYQQTAA I LTFFXTSCPPRSNPYQQYRRLRLYAFHAPE ITEFFVGFAF 

1 1 M 1 1 1 1 It 1 1 1 1 1 1 It 1 1 1 1 1 1 IMIIMI lllllllllllll IMMIIIMII 

35 orf 122 - 1 SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

1 1 1 1 1 1 M 1 1 1 1 1 M I M Ml II I M I II MM M II 1 1 1 M M II M 1 1 1 1 M II 1 1 

40 orf 122-1 DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

130 140 150 160 170 180 

190 200 210 ■ 220 230 240 

orf 122a . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

1 1 II 1 1 1 1 1 1 1 II M 1 1 II 1 1 1 II 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 1 1 1 III II M I 

45 orf 122-1 FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 

250 

orf 122a. pep DIVALSDTDVRHRLCSX 

I I I I I I I I I II II I I I 
50 orfl22-l DIVALSDTDVRHRLCSX 

250 
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Homology with a predicted ORF from N .gonorrhoeae 

ORF122 (SEQ ID NO: 792) shows 89.6% identity over a 182 aa overlap with a predicted ORF 
(ORF122ng) (SEQ ID NO: 798) from N. gonorrhoeae: 



10 



15 



orf 122 .pep 
orf 122ng 



orf 122 .pep 



orf 122ng 



orf 122 .pep 



orf 122ng 



orf 122 .pep 
orf 122ng 



TAFSAALRLSPSXLVIFLSFGKPYQQTAAI 

Illllhlll I : I I I I I 1 I I I I I I I M I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 



30 



80 



90 



LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 

Illllll Mill IIIMIIIMIIIMIIMIIMII IIMIIM :MIIIIIIIII 

LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR 140 
NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 150 

III I III IMMMMIMIMMMMMIMMIMM IMMMMIMM II 

NVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT 200 



EQRVGNGVQQR I G I GVS EQ PF F KWDFNS AKYQ 
IIMIIIIIIhll : lllllllllllllll 

EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 



182 



256 



The complete length ORF122ng nucleotide sequence (SEQ ID NO: 797) is: 



20 



25 



30 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtCCttt 
TTTGCACGtC 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 
tcggtaaCGG 
TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 
A 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatc 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
cgcgTcgGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



35 This encodes a protein having amino acid sequence (SEQ ID NO: 798): 



40 



i 

51 
101 
151 
201 
251 



MSYRASSSPD 
TAFSAAMRLS 



FLEVETAPLI 
SSCWIFLSF 



LRLYAFHPPE 
NHGRIDIDHL 
EQRVGNGVQQ 
RHRLCS* 



IAEFFVGFAF 
PTLRLNALIR 
RVGIRMPEQP 



FLPLLPKASM 
GKPYQQTAAI 
DIDARNIDTQ 
RTQKDAAVRI 
FFKWDFNSAK 



KKLMVEPVPM 
LTFFCTSWPP 
IGGDVGTHLR 
FELCGGVGKM 
YQLSAFGQLV 



PMYSFSGTNS 
RSNPYQQYRR 
NVRCEFGFLC 
AADVAQTCRT 
DIVALSDTDI 



ORF122ng (SEQ ID NO: 798) and ORF122-1 (SEQ ID NO: 794) show 92.6% identity in 256 aa 
overlap: 
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10 20 30 40 50 60 

orf 122-1. pep ISYWASSSPDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPIYSFSGTNSTAFSAAMRLS 

:|l I I I I I I I I M I I I I I I I I I I I I I I I I I I M ! I I M I I I I I I I I I I I I I II I I 
orf 122ng MSYRASSSPDFLEVETAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 122-1 .pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 Mill I M 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 i 1 1 1 

orf 122ng SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 
10 70 80 90 100 110 120 



130 140 150 160 170 180 

orf 122 - 1 . pep DVDARNVYAQIGGDVGTHLRNVRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 

hi llh M I I I I I I I I I I I II II II II MM Nihil II II II II II I II II III I 
orf 122ng DIDARNIDTQIGGDVGTHLRNVRCEFGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRI 
15 130 140 . 150 160 170 180 

190 200 210 220 230 240 

orf 122 - 1 . pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

1 1 1 1 1 1 M 1 1 M 1 1 1 1 1 M M 1 1 1 1 1 M : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 

orf 122ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 
20 190 200 210 220 230 240 

250 

orf 122-1 .pep DIVALSDTDVRHRLCSX 

I I I I I I I I I : I I I I I I I 
orfl22ng DI VALSDTD I RHRLCSX 

25 250 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 



30 The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 799): 



1 . . GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 

51 CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 

101 TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCTCGGT ATTTGCGCCG 

151 ATGGGGCGGA TTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

35 ~ 

This corresponds to the amino acid sequence (SEQ ID NO: 800; ORF125): 



1 . . AGASANNISA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 
51 MGGFDCRLFR LETA* 

40 Further work revealed the complete nucleotide sequence (SEQ ID NO: 801 ): 



1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 
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201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

5 401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

10 651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 

801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

851 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

15 901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

20 1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

12 01 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 802; ORF125-1): 

1 MSGNASSPSS SSAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

25 51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KRGSVL'FSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEVF ST AGSTAAQVSD GMSFGTAVEL SAVMPLSWLP 

2 01 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 
251 LGAGLGAAGI LAWL STVTT TFLDAYSAGA SANNISARFA E TPVAVGVTL 

30 3 01 IGTVLAVM LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEGFDF 

3 51 AGLVLWLAGF ILYRFLLSSG WESSIGLTAP VMSAVAIATV SVRLFFKKTQ 

4 01 SLQRNPS* 

Computer analysis of this amino acid sequence gave the following results: 

35 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 (SEQ ID NO: 800) shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) 
(SEQ ID NO: 804) from strain A of N. meningitidis: 

10 20 30 

orf 125 . pep AGASANNI SARFAETPVAVSVTLIGTVLAV 

40 I I ' I I I I I I I • : : I I • I I : I : • • I I : I I I 

orf 125a KILLGAGLGAAGILAWLSTVTTTFLDAYSAGVSANNI SAKLSEI P IAVAVAWGTLLAV 

250 260 270 280 290 300 

40 50 60 

orf 12 5 .pep MLPVTE YENFLLL I GS VFAPMGGFDCRLFRLETAX 
45 : | | | | | | | | | | | | M | | | | | | : 

or f 1 2 5 a LLP VTEYENFLLL I GS VFAPMAAVL I ADFFVLKRREE I EG 

310 320 330 340 



The ORF125a partial nucleotide sequence (SEQ ID NO: 803) is: 
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1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

5 2 01 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

4 01 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

10 4 51 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

15 701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATC CTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG GAAATACCNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

20 951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. . 

This encodes a protein having the partial amino acid sequence (SEQ ID NO: 804): 

1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

25 51 AVGGA LFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMI YAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGV SANNISAKLS E IPIAVAVAV 

30 301 VGTLLAVL LP VTEYEN FLLL IGSVFAPMAA VLIA DFFVLK RREEIEG. . 

ORF125a (SEQ ID NO: 804) and ORF125-1 (SEQ ID NO: 802) show 94.5% identity in 347 aa 
overlap: 

10 20 30 40 50 60 

35 orf 125a. pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

IMIIII I h 1 1 1 1 1 1 1 1 1 1 1 1 1 M II I i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 

orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

40 orf 125a . pep AY I GALTGXXSMESVRLS FGKRGSVLFS VANMLQLAGWTAVM I YAGATVS SALGKVLWDG 

I I I I I I I I II I I I I I I I I I I I II I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
orfl25-l AY I GALTGRS SME S VRLS FGKRGSVLFS VANMLQLAGWTAVM I YAGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 180 

45 orf 125a. pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAXVXD 

1 1 1 h 1 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MINIMI I I 

orf 125-1 ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 

130 140 150 160 170 180 

190 200 210 220 230 240 

50 orf 125a .pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWM YALGLAAALF 

II llllllllllll IIMIIIIIIII IIMMIIIIII'IIIIIIIIIIIMIIII 

or f 1 2 5 - 1 GMS FGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWM YALGLAAALF 
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190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a . pep TGETDVAKI LLGAGLGAAGI LAWLSTVTTTFLDAYSAGVSANNI SAKLSEI PI AVAVAV 

M M I I I I ! I I I I I I I I I I M I M I I I I I I M : I I I M I II I I h : H hlhh: 
5 or f 12 5 - 1 TGETDVAK I LLGAGLGAAG I LAWLS TVTTT FLDA YSAGAS ANN I SARFAETP VAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a . pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLI ADFFVLKRREE I EG 
:||: I I h , I I I I I I i I I i I I I I I I I I I I I I I M I I I I I I I I I II 
1 0 orf 125 - 1 IGTVLAVMLPVTEYENFLLLIGSVFAP^4AAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 

310 320 330 340 350 360 

Homology with a predicted ORF from N. gonorrhoeae 

ORF125 (SEQ ID NO: 800) shows 86.2% identity over a 65aa overlap with a predicted ORF 
(ORF125ng) (SEQ ID NO: 806) from N. gonorrhoeae: 

15 orf 125. pep AGASANNISARFAETPVAVSVTLIGTVLAV 30 

IIIIIIIIMIIII lllhllll Mill 
or f 12 5ng KI LLGAGLG I TGI LAWLS TVTTT FLDT YSAGAS ANN I S ARFAE I PVAVGVTL I RTVLAV 308 

or f 1 2 5 pep MLPVTE YENFLLL I GSVFAPM - GGFDCRLFRLETA 6 4 

I I I I II M II I I II llhll llllllll hll 
20 orfl25ng MLPVTEYKNFLLLIRSVFGPMAGGFDCRLFCLKTA 343 

An ORF125ng nucleotide sequence (SEQ ID NO: 805) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 806): 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

25 51 AVGGA LFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

251 LLGAGLG I TG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

30 3 01 LIRTVLAVM L PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 807): 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

35 101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

2 01 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

2 51 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

40 3 51 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATTGGCAAAC GGCGCACTGA 

4 01 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 

4 51 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 

501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 

551 CCTTCGGAAC GGCAGTCGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 

45 6 01 CCGCTGGCCG GCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 

651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 
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701- TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

751 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

801 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 

851 ACAACATTTC CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 

5 901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

10 1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence (SEQ ID NO: 808; ORF125ng-l): 



1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 
15 51 AVGGALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 
151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 
201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI^ 
251 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 
20 301 LIGTVLAVM L PVTEYKN FLL LIGSVFAPMA AVLIA DFFVL KRREEIEGFD 

3 51 FAGLVLWLAG FILYRFLLSS GWESSIGLTA PVMSAVAIAT VSVRLFFKKT 
401 QSLQRNPS* 

ORF125ng-l (SEQ ID NO: 808) and ORF125-1 (SEQ ID NO: 802) show 95.1% identity in 408 aa 
25 overlap: 

10 20 30 40 50 60 

orf 125-1 .pep MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

I IIMIIIIMI MllilMMMIIIMIIIIIIIlMIIIIIMM MINI 

orf 125ng-l MSGNASSPSSSAAIGLVWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
30 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 125-1 .pep AY I GALTGRSSMES VRLS FGKRGS VLFS VANMLQLAGWTAVM I YAGATVS SALGKVLWDG 

lllllllllllllllllllll 1 1 M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 II 1 1 1 1 1 1 1 1 1 1 

orfl25ng-l AY I GALTGRSSMES VRLS FGKCGS VLFS VANMLQLAGWTAVM I YVGATVS SALGKVLWDG 

35 70 80 90 100 110 120 

130 140 150 160 170 179 

orf 125-1 .pep ESFVWWALANGALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I M I I I I I I I I I I I I I I I h I I I I I I M I I I I I I I M M:| 1 I- : -I II 
orf 125ng-l ESFVWWALANGAL I VLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVF AS SGTNAAPAVS 

40 130 140 150 160 .170 180 

180 190 200 210 220 230 239 

orf 125- 1 . pep DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 

1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 1 1 1 1 1 1 1 i 1 1 1 1 1 i I j 1 1 1 ! 1 1 1 ! ! 1 1 1 1 1 1 

orf 125ng-l DGMTFGTAVELSAVMPLSWLPLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 
45 190 200 210 220 230 240 

240 250 260 270 280 290 299 

orf 125-1. pep FTGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 

1 1 1 1 1 1 1 1 1 i M 1 1 : 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 M M I M 1 1 1 1 1 1 1 T MINI 

orfl25ng-l FTGETD VAK I LLGAGLG I TG I LAWLS TVTTT FLDT YS AGAS ANN I SARFAE I P VAVGVT 

50 250 260 270 280 290 300 
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300 310 320 330 340 350 359 

orf 125-1 .pep LIGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

I I I I I I I M I I I I I : I I I I I , I I I I I II I M I I . I M I I I I I I I I I I I I I I I I I I I I 
orf 125ng-l LIGTVLAVMLPVTEYKNFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAG 

310 320 330 340 350 360 

360 370 380 390 400 

orf 125 - 1 . pep FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

I I I II I I ! I I I I I I I I I I I I I I I I I I I I I I I I : I I I I II I I I I I I I I 

orf 125ng- 1 . FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 

370 380 390 400 

Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 96 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 809): 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 CCTGCAGCGG A . ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

2 51 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

3 01 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT . ACGGA 
351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 
401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC . CAG 

4 51 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 
501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 810; ORF 126): 



1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDE I VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK. . . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 811): 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

2 01 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 
251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 
301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

3 51 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 
451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 
501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 
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551 


CCCAATACGA 


601 


TGGAACCAAT 


651 


AGTGGCGCGG 


701 


TGCTCCATCC 


751 


TTCGTCATCG 


801 


CGTGCGTTCA 


851 


CCTTCGGCGA 


901 


CTCAACCACC 


951 


TGAAATCAAC 


1001 


CCGCCGCCGC 


1051 


CCCGAACGCG 


1101 


A 



CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 
CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 
GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGTC 
GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 
GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 
GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 
AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 
ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 
GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 
10 1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

This corresponds to the amino acid sequence (SEQ ID NO: 812; ORF126-1): 

15 1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAEAVEATP EWRLGRQS I PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDE I VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

201 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

20 251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

3 01 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAARL AVALF DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 

25 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF126 (SEQ ID NO: 810) shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) 
(SEQ ID NO: 814) from strain A of N. meningitidis: 



10 20 30 40 50 60 

orf 126 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 
30 | | | | | | | | | | | | M | | | | | | | | | | | | | | | | | | : | | | | | | | | | | | | | | | | | M | : | | M I 

orf 126a MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 6 .pep EWRLGRQS I PLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 

35 I I II II I I MIIMMIMM HI I I I I I I I I I I I I I hi I I I M I M I HI I 

orf 126a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 12 6 . pep VRWRADDI AEREPQLGGRFXDGI YLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 

40 M II II II II II II II I II II MM II II Ih I II II II III II II IMIhll 

orf 126a VRWRADDI AEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF126a nucleotide sequence (SEQ ID NO: 813) is: 

45 1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 
101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 
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10 



15 



20 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



CCTGCGGCGG 
GCAGANCATC 
CCATGATGCA 
CCTTTATCCA 
TGACNAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCC 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTACACCC 
TTCGTCATCG 
CGTGCGTTCC 
CCTTCGGCGA 
CTCAATCACC 
TGAAATCAAC 
CCGCCGCCGC 
CCCGAACGCG 
A 



AAGCGGTCGA 
CCGCTTTGGC 
NGAAAACGGC 
ACGAGTTCGT 
GTCCGTTGGC 
ACGTTTTTCA 
GGCAAATATT 
TGCCATTGGG 
CTGGCTGATC 
CCCCCGANNA 
GTTTACACAC 
GCGCTATCCG 
GCGCGACCCA 
GGGCTGGAAC 
AGCCGACATC 
ACAACCCCGA 
GGCCTTTTCC 
CGTCAGATTG 
ATGAAGAAAG 



AGCCACGCCT 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGACGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGCG 
NACCAGCACC 
CCGAAATCAC 
CTNTACATCG 
AATCGAAAGC 
TCTTATCCGC 
CTCGAAATCG 
AATCCGTTAC 
GCCACGGTTT 
GCAGTGGCAC 
CGGTTTGGCG 



GAAGTGGTCA 
ATGCCATCTG 
TGTGGCACGG 
AAACGCGGCG 
CATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
TGCCCCCGAA 
GCTACGGCGC 
CTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
ACTCTATGCC 
CCACCGGCCT 
AACCGCGCCC 
CATGATCTCC 
TGTTTGACGG 
TATATCCGAA 



GGCTGGGCAG 
AAAACGCCTG 
GCAGGACAAA 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GACTTGCAAG 
AAAAACCGCG 
TACGCGGCGA 
CCCGTGCGCC 
AAACCNCGTC 
CACCTGCCAG 
GTCCACCCCG 
GCGCCCCACG 
GACGCCTGAT 
CCCGCCGTAA 
AAAAGANGCG 
GACAAGATTA 



This encodes a protein having amino acid sequence (SEQ ID NO: 814): 



25 



30 



1 MTRIAILGGG 

51 PAAEAVEATP 

101 PLSNEFVRHL 

151 LDGRQILSAL 

201 WNQSPXXTST 

251 FVIGATQIES 

301 LNHHNPEIRY 

3 51 PERDEESGLA 



LSGRLTALQL 
EWRLGRQXI 
KRGGVADDXI 
ADALDELNVP 
LRGIRGEVAR 
ESQAPASVRS 
NRARRLIEIN 
YIRRQD* 



AEQGYQIALF 
PLWRGIRCHL 
VRWRADDIAE 
CHWEHECAPE 
VYTPEITLNR 
GLELLSALYA 
GLFRHGFMIS 



DKGCRRGEHA 
KTPAMMXENG 
REPQLGGRFS 
DLQAQYDWLI 
PVRLLHPRYP 
VHPAFGEADI 
PAVTAAAVRL 



AAYVAAAMLA 
SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
LYIAPKENXV 
LEIATGLRPT 
AVALFDGKXA 



ORF126a (SEQ ID NO: 814) and ORF126-1 (SEQ ID NO: 812) show 95.4% identity in 366 aa 
overlap: 
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10 20 30 40 50 60 

orf 126a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

1 1 1 1 1 II II II M II II Ml 1 1 II II II III II II II III II Ml II II MM M M 

orf 126 - 1 MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 



40 



70 80 90 100 110 120 

orf 126a . pep EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

MINIM III IMIMMI Ml 1 1 1 1 1 1 1 1 1 M I M M M 1 1 1 M 1 1 1 1 1 II I I 

orf 126-1 EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 



45 



130 140 150 160 170 180 

orf 126a . pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

II M M 1 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 1 1 II 1 1 II M I II 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I 

orf 126 - 1 VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 



50 



190 200 210 220 230 240 

orf 126a . pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 126-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
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PATENT 



190 200 210 220 230 240 

250 260 270 280 290 300 

orf 126a. pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 

MINIM 1 1 I ' M I M M 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 hi 1 1 1 1 1 1 1 1 1 1 M 1 1 M I 

5 orf 126-1 LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 126a. pep LNHHNPE IRYNRARRLI E INGLFRHGFMI S PAVTAAAVRLAVALFDGKXAPERDEESGLA 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I = I I I I I 
10 orf 126-1 LNHHNPE I RYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA • 

310 320 330 340 350 360 

orf 126a. pep YIRRQDX 
I II M M 

15 orfl26-l YIRRQDX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF126 (SEQ ID NO: 810) shows 90% identity over a 180 aa overlap with a predicted ORF 
(ORF126ng) (SEQ ID NO: 816) from N. gonorrhoeae: 

orf 126 .pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 

20 Ml hill III II I II 1 1 II IIIM MIM Mill II MM M Mill MM II 

orf 126ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 126 .pep EWRLGRQS I PLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 12 0 

I h II I II II II II II II 1 1 M 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 M 1 1 1 1 Mill 

orf 126ng EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 12 0 

25 orf 126 .pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 180 

I I I j I I H I I I I I I I I I I I I I I I I I I I MUM: M I 1 I I 1 I I I I I I M M I I I : I : 
orf 126ng VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 180 

An ORF126ng nucleotide sequence (SEQ ID NO: 815) was predicted to encode a protein having 
30 amino acid sequence (SEQ ID NO: 816): , 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

35 201 WNQSPEHTST LRGIRGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPE I RY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEESGLA YIGRQD* 

40 Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 817): 



1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 
51 ATTGCAGCTT GCAGAACAAG GTTATCAGAT TGAACTTTTC GACAAGGGCA 
101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 



CHIR-0160 (356.001) 



-569- 



PATENT 



10 



15 



20 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 



CCTGCGGCGG 
GCAGAGCATT 
CGATGATGCA 
CCATTATCCA 
TGACGAAATC 
AACTCGGCGG 
CTCGACGGGC 
GAACGTCCCT 
CCCAATACGA 
TGGAACCAAT 
AGTGGCGCGG 
TGCTGCACCC 
TTCGTCATCG 
CGTACGTTCC 
CCTTCGGCGA 
CTCAACCACC 
CGAAATCAAC 
CCGCCGCCGC 
CCCGAACGTG 
A 



AAGCGGTCGA 
CCGCTTTGGC 
GGAAAACGGC 
GCGAGTTCGT 
GTCCGTTGGC 
ACGTTTTTCA 
GGCAAATATT 
TGCCATTGGG 
CTGGGTAATC 
CCCCCGAGCA 
GTTTACACGC 
GCGCTATCCG 
GCGCGACCCA 
GGGCTGGAAC 
AGCCGACATC 
ACAACCCCGA 
GGCCTTTTCC 
CGTCAGATTG 
ATGAAGAAAG 



GGCAACGCCC 
GCGGCATCCG 
AGCCTGATTG 
CCGCCATCTC 
GCGCCGATGA 
GACGGCATCT 
GTCTGCACTT 
AACACGAATG 
GACTGCCGGG 
CACCAGCACC 
CCGAAATCAC 
CTCTACATCG 
AATCGAAAGC 
TCTTATCCGC 
CTCGAAATCG 
AATCCGCTAC 
GGCACGGCTT 
GCAGTGGCAC 
CGGTTTGGCG 



GAAGTCATCA 
ATGCCGTCTG 
TGTGGCACGG 
AAACGCGGCG 
AATCGCCGAA 
ACCTGCCGAC 
GCCGACGCTT 
CGCCCCCCAA 
GCTACGGCGC 
TTGCGCGGCA 
GCTCAACCGC 
CCCCGAAAGA 
GAAAGCCAAG 
GCTCTATGCC 
CCGCCGGCCT 
AGCCGCGAAC 
TATGATTTCC 
TGTTTGACGG 
TATATCGGAA 



GGCTGGGCAG 
AACACGCTCA 
GCAGGACAAG 
GCGTAGCGGA 
CGCGAACCGC 
CGAAGGCCAG 
TGGACGAACT 
GACCTGCAAG 
GAAAACCGCG 
TACGCGGCGA 
CCCGTGCGCC 
AAACCACGTC 
CCCCCGCCAG 
GTCCACCCCG 
GCGCCCCACG 
GCCGCCTCAT 
CCCGCCGTAA 
AAAAGACGCG 
GACAAGATTA 



This corresponds to the amino acid sequence (SEQ ID NO: 818; ORF126ng-l): 



25 



30 



1 MTRIAVLGGG 

51 PAAEAVEATP 

101 PLSSEFVRHL 

151 LDGRQILSAL 

201 WNQSPEHTST 

251 FVIGATQIES 

301 LNHHNPEIRY 

351 PERDEESGLA 



LSGRLTALQL 
EVIRLGRQSI 
KRGGVADDEI 
ADALDELNVP 
LRGIRGEVAR 
ESQAPASVRS 
SRERRLIEIN 
YIGRQD* 



AEQGYQIELF 
PLWRGIRCRL 
VRWRADEIAE 
CHWEHECAPQ 
VYTPEITLNR 
GLELLSALYA 
GLFRHGFMIS 



DKGTRQGEHA 
NTLTMMQENG 
REPQLGGRFS 
DLQAQYDWVI 
PVRLLHPRYP 
VHPAFGEADI 
PAVTAAAVRL 



AAYVAAAMLA 
SLIVWHGQDK 
DGIYLPTEGQ 
DCRGYGAKTA 
LYIAPKENHV 
LEIAAGLRPT 
AVAL FDGKDA 



ORF126ng-l (SEQ ID NO: 818) and ORF126-1 (SEQ ID NO: 812) show 95.1% identity in 366 aa 
overlap: 
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10 20 30 40 50 60 

orf 12 6 - 1 . pep MTRIAILGGGLSGRLTALQLAEQGYQI ALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

I I I I : I M M I I II I I I I I : I I I II HIM h I I I I I I I I I M I I I I M I I I I I ! 
orf 12 6ng- 1 MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 



40 



70 80 90 100 110 120 

orf 126 - 1 . pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I II I I I M 
orf 126ng-l EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 

70 80 90 100 110 120 



45 



130 140 150 160 170 180 

orf 126-1 .pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

II II h II 1 1 II II M ! 1 1 II 1 1 li 1 1 II II , 1 1 II I! I 1 1 1 M II I II II I lh I: 

orf 126ng-l VRWRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQ 

,130 140 150 160 170 180 



50 



190 200 210 220 230 240 

orf 126-1 .pep GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

I II I I : I I M I I I I I ■ I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I 
orf 126ng-l DLQAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 
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190 200 210 220 230 240 

250 260 270 280 290 300 

orf 126-1. pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 

I , I M Ml 1 1 1 M 1 1 1 1 1 1 1 II M Ml IN 1 1 M 1 1 M : 1 1 M 1 1 1 1 1 1 ! I hi 1 1 1 

5 orf 126ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 126-1 .pep LNHHNPE I RYNRARRL I E I NGLFRHGFM I S PAVTAAAARLAVALFDGKDAPERDKESGLA 

III Mil Ih I III 1 1 II II III II MM II 111 = 1 II II II 1 1 II II 1 1 hi 1 1 II 

10 orf 126ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 

310 320 330 340 350 360 

or f 12 6 - 1 . pep YIRRQDX 
II I I I I 

15 orf!26ng-l YIGRQDX 

Furthermore, ORF126ng-l (SEQ ID NO: 818) shows homology to a putative Rhizobium oxidase 
flavoprotein (SEQ ID NO: 1 163): 

gi | 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
20 Length =32 7 

Score = 169 bits (423), Expect = 3e-41 

Identities = 112/329 (34%) , Positives = 163/329 (49%), Gaps = 25/329 (7%) 

Query: 3 R I AVLGGGLSGRLTALQLAEQGYQ I ELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 62 
RI V G G++G A QL G+++ L ++ G 
25 Sbjct: 2 RILVNGAGVAGLTVAWQLYRHGFRVTLAERAGTVGA-GASGFAGGMLAPWCERESAEEPV 60 

Query: 63 IRLGRQSIPLWRGIRCRLNTLTI^QENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 122 

+ LGR + W + G+L+V G+D F R G DE+ 

Sbjct: 61 LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS - GWEWLDEVA- 113 

Query: 123 WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 182 
30 I A EP L GRF ++ E LD RQ L+ALA L++ + + 

Sbjct: 114 IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 165 

Query: 183 QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYPLY 242 

+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+P+Y 

Sbjct: 166 DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 218 

35 Query: 243 I APKENHVFV I GATQ I ES ESQAPAS VRSGLELLS ALYAVHP AFGEAD I LE I AAGLRPTLN 3 02 

I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 
Sbjct: 219 IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 278 

Query: 303 HHNPE I RYSRERRL I E INGLFRHGFM ISP 331 
+ P R ++E R + +NGL+RHGF+++P 
40 Sbjct: 279 DNLP- -RVTQEGRTLHVNGLYRHGFLLAP 305 

This analysis suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 
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The following DNA sequence, believed to be complete, was identified in N. meningitidis (SEQ ID 
NO: 819): 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

5 101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

3 01 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 
10 351 TGAAAATCTA GTAACCTTTA ^TTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

4 01 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 
4 51 GTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 820; ORF127): 

15 1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 

101 KAVAIDKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 

151 * 

20 Further work revealed the following DNA sequence (SEQ ID NO: 821): 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

25 201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

■ 251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

30 

This corresponds to the amino acid sequence (SEQ ID NO: 822; ORF127-1): 

1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 



35 



51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
.101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF127 (SEQ ID NO: 820) shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) 
(SEQ ID NO: 824) from strain A of N. meningitidis: 



40 10 20 30 40 50 60 

orf 12 7 .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

1 1 1 1 M 1 1 1 1 1 1 1 :i 1 1 1 1 M 1 1 1 1 1 1 1 ! I i I M 1 1 1 1 h 1 1 i M I i ' M 1 1 1 1 1 1 II 

orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 127. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 

I I I I I I I I I I I I ; I II I I I I I I I I I I II I I I I I I I I I I I I I M I I I il I I I I I I 
orf 12 7a GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 

70 80 90 100 110 



130 140 150 

orf 127 .pep VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 

I I I I I I i 1 I II I I I I I I I I I I I I I I I I I I 
orf 127a . VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
10 120 130 140 150 



The complete length ORF127a nucleotide sequence (SEQ ID NO: 823) is: 

1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

15 101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

2 01 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA. AGATGAATGA 

20 3 51 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 



This encodes a protein having amino acid sequence (SEQ ID NO: 824): 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAALLENA 
25 51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 

101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 



ORF127a (SEQ ID NO: 824) and ORF127-1 (SEQ ID NO: 822) show 99.3% identity in 149 aa 
overlap: 



30 t 10 20 30 40 50 60 

orf 127a .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
I I I I I 1 I i I I I I I I I I I I I I I , I I I II I I I I I I I I II I M I I I I I I I I I II I I I I I I 
orf 127-1 MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



35 70 80 90 100 110 120 

orf 12 7a. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

Mllllll IIMIIIMMIMIIIIIMIIII MINIMUM llllllllllll 

orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 



40 



130 140 150 

orf 127a. pep TF I CKKSASSCSDGLDYFKGNDKDCKLLKX 

I I M I I M M I I I I I I I I I I I I I I II I 
orf 127-1 TF I CKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 



45 Homology with a predicted ORF from N. gonorrhoeae 
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ORF127 (SEQ ID NO: 820) shows 97.3% identity over a 150 aa overlap with a predicted ORF 
(ORF1 27ng) (SEQ ID NO: 826) from N. gonorrhoeae: 

orf 127 . pep MTDNRGFTLVEL I SWL I LS VLAL I VYPS YRNYVEKAKI NAVRAALLENAHFMEKFYLQN 60 

I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I M I I I I I I I h I I i I I I I I I II I I I 
orf 127ng MTDNRGFTLVEL I SWL I LS VLAL I VYPS YRNYVEKAKI NAVRAAFLENAHFMEKFYLQN 60 

orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFI IKMNENL 120 

I II 1 1 I IMI II i II I! I III I III I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 

orf 12 7ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFI IKMNENL 119 

orf 12 7. pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I I I I I I I I I I I I ! I I I ! I I I I I I I I I I I 
orf 12 7ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 

The complete length ORF1 27ng nucleotide sequence (SEQ ID NO: 825) is: 

1 ATGACTGATA ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

2 01 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 
251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

3 01 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 
351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 826): 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP F I I KMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng (SEQ ID NO: 826) and ORF127-1 (SEQ ID NO: 822) show 100.0% identity in 149 aa 
overlap: 

10 20 30 40 50 60 

orf 127-1 .pep MTDNRGFTLVEL I SWL I LS VLAL I VYPS YRNYVEKAKINAVRAALLENAHFMEKFYLQN 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
orf 127ng-l MTDNRGFTLVEL I SVVL I LSVLAL I VYPS YRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 127-1 .pep GRFKQTSTKWPSLP I KEAEGFC I RLNG I ARGALDS KFMLKAVAI DKDKNPF 1 1 KMNENLV 

lllllll II lllllliriMIIIIIIIII MINIMI MINIM II III MINIMI 

orf 127ng-l GRFKQTSTKWPSLP I KEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFI I KMNENLV 

70 80 90 100 110 120 

130 140 150 

orf 127-1 .pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 127ng-l TFICKKSASSCSDGLDYFKGNDKDCKLLKX 

130 140 150 
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This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 98 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 827) 

1 . . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACC£ GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

2 01 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 
251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 
301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

3 51 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

4 01 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAATGGAAA 
4 51 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 
501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 
551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 
601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 
651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 
701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 828; ORF128): 



1 . .VSLASVIASQ I FLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID" KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 829): 



1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCCTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCCCTTTTGC TGATATTTTG CTGCAAAAAA ACCAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TGTTTTTGAT TTTGACTGCC TCATCGTTTT 

.551 TGCCAAGCGG GTTTTATACC GACATCCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 
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851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

12 01 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGAG 
1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 
1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

13 51 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCTGTGCCGA 
14 51 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 
1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 
1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 
1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 
1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT . AAAGATATTC CCAATGTGCA 
1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 
1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 
18 01 TATATGGGGC GGGAATTCCA CAAACACGAA CGCCTGCTTA AATCTTCCCA 
1851 CGGCGGCGCA TTGCAGTAG 

This corresponds to the amino acid sequence (SEQ ID NO: 830; ORF128-1): 



1 MQAVRYRPE I DGLRAVAVLS VMIFHLN NRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SSIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SIILFLILTA SSFLPS GFYT DILNQPNTYY 

2 01 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

2 51 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 
301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

3 51 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 
4 51 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSHGGA LQ* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with hypothetical integral membrane protein HI0392 of H. influenzae (accession number 
U32723) (SEP ID NO: 1164) 



ORF128 (SEQ ID NO: 828) and HI0392 (SEQ ID NO: 1164) show 52% aa identity in 180aa 
overlap: 



Orfl28: 1 VS LAS VI ASQ I FLYEDFNQMRKTVELS AVFLSN I YLGFQQGYFDLS ADENP VLH I WS LAV 60 

++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 

HI0392: 46 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 

Orf 128 : 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYL I FPL I L I LAYKKFREVKVLF 1 1 TL I LFF I LLATS FVSANFYKEVLHQPN I YYLS 165 
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Orfl28: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF128 (SEQ ID NO: 828) shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) 
(SEQ ID NO: 832) from strain A of N. meningitidis: 



10 



20 



30 



10 



orf 128 .pep VSLASVIASQI FLYEDFNQMRKTVELSAVF 

1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 1 1! II 1 1 

orf 128a ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQI FLYEDFNQMRKTVELSAVF 

60 ' 70 80 90 100 110 



15 



40 50 60 70 80 90 

orf 128 . pep LSN I YLGFQQGYFDLS ADENP VLH I WS LAVEEQYYLLYPLLL I FCCKKTKSLRVLRNI S I 

MIMMMMMMMMMIMM IMIMMMM Illlllllll IIIIMI 

orf 128a LSNIYLGFQQGYFDLSADENPVLH I WS LAVEEQYYLLYPLLL I FCCKKTKSLRVLRNI SI 

120 130 140 150 160 170 



20 



100 110 120 130 140 150 

orf 128 . pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 

I MMMMMIMMMMMMMIMMMIIMM MMIIIMI MIIMM 

orf 128a ILFLILTATSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 
180 190 200 210 220 230 



25 



160 170 180 190 200 210 

orf 128 .pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 

illllllllllll MMIIIIIMIIIIIIIIIMI llllllllllil IIIMM! 

orf 128a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 



30 



220 230 240 

orf 128 . pep VFVGKI S YSLYL YHW I F I AFAPL I RGGKQLGLPA 

llllllllllil Mill I I IIIIMI 
orf 12 8a VFVGKI S YSLYL YHW I F I AFAHYI TGDKQLGLPAVS AVAALTAGFS LLS YYL I EQPLRKR 

300 310 320 330 340 350 



orf 128a KMTFKKAFFCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSH 
360 370 380 390 400 410 

35 The complete length ORF128a nucleotide sequence (SEQ ID NO: 831) is: 



40 



45 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGCAAGCTG 
CGTGCTATCC 
GATTCCTGGG 
GGCATCATTC 
TTATACCCGC 
CGCTGGCTTC 
CAAATGCGGA 
TCTGGGGTTT 
TACTGCATAT 
CCTCTTTTGC 
GCGTAACATC 



TCCGATACAG 
GTCATGATTT 
GGTGGACATT 
TTTCTGAAAT 
AGGATTAAGC 
GGTGATTGCC 
AAACCGTGGA 
CAGCAGGGGT 
CTGGTCTTTG 
TGATATTTTG 
AGCATCATCC 



ACCGGAAATT 
TCCACCTGAA 
TTCTTTGTCA 
ACAGAACGGT 
GGATTTATCC 
TCTCAAATCT 
GCTTTCTGCG 
ATTTCGATTT 
GCAGTAGAGG 
CTGCAAAAAA 
TATTTCTGAT 



GACGGATTGC 
TAACCGCTGG 
TCTCAGGATT 
TCTTTTTCTT 
TGCTTTTATT 
TCCTTTACGA 
GTTTTCTTGT 
GAGTGCCGAC 
AACAGTATTA 
ACAAAATCGC 
TTTGACTGCC 



GGGCCGTCGC 
CTGCCCGGAG 
CCTCATTACC 
TCCGGGATTT 
GCGGCCGTGT 
AGATTTCAAC 
CCAATATTTA 
GAGAACCCCG 
CCTCCTGTAT 
TACGGGTGCT 
ACATCGTTTT 



\ 
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551 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

701 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

5 751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

10 1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

15 1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

13 01 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 
1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 
1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 
20 1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 CAACACATCA ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

25 1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 

1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 832): 

30 1 MQAVRYRPE I DGLRAVAVLS VMIFHLN NRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN I SI ILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

35 251 IDKHNPF IPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

3 01 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 
351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 
451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 

40 501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYL RPIQAMGDIG 

551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a (SEQ ID NO: 832) and ORF128-1 (SEQ ID NO: 830) show 99.5% identity in 622 aa 
45 overlap: 

orf 128a. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

I I I I I I I I I I I I I ] I I I I M I II I I I I I I I I I M M Ml I I I I I I I i I I I I I I I I 
orf 128-1 MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

orf 128a . pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

50 I I I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I II I I I I II I I I I I I I I I I I I I I I I I 

orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a . pep ' QQGYFDLSADENP VLH I WSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNIS I ILFLILTA 
I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I'l M I I I I I I I I I I I I I I I I I I I I 
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orf 12 8 - 1 QQGYFDLSADENPVLH I WSIiAVEEQYYLLYPLLL I FCCKKTKSLRVLRNIS I 1 LFL I LTA 

orf 12 8a . pep TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

MIIMI lllllll IIIIIIIIIIIIMIIIIMIIIIIIII MMIMIMIIMIMI 

orf 12 8 - 1 SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
orf 12 8a. pep FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i 1 1 1 1 1 1 1 1 1 i 1 1 ! 1 1 1 1 1 1 1 1 1 F 1 1 1 1 1 

orf 12 8-1 FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
orf 12 8a . pep SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 

MM III INI III II I MM III I III III INI III MM III INI III INI III I 

orf 12 8-1 SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 



orf 12 8a . pep FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

1 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 1 1 1 1 M M I 1 1 1 1 

orf 12 8 - 1 FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 



15 



20 



25 



orf 12 8a. pep DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

II I 1 1 1 1 1 j II I ! II 1 1 1 1 1 1 1 1 M MINIM 

orf 128-1 DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
orf 12 8a. pep PVPRFEAQS FL I PGFPARFRETVKR I AAVKPVYVFANNTS I SRS PLREEKLKRFAANQYL 

lllllll lllllll MIIIIIMI MM II llllllll llllllllllllll INI III 

orf 128-1 PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

orf 12 8a. pep RPIQAMGDIGKSNQAVFDLIKDIPNVHVTVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

II M I II I I I I I I I II I II I I II I I I I I I I II I I II I I II I I I I I II I I II I I.I I II II I 
orf 128- 1 RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

orf 12 8a. pep YMGREFHKHERLLKSSRDGALQX 

IMMIMIIMM IM Mill 
orf 12 8-1 YMGREFHKHERLLKSSHGGALQX 



Homology with a predicted ORF from N.sonorrhoeae 

ORF128 (SEQ ID NO: 828) shows 93.4% identity over 244 aa overlap with a predicted ORF 
(ORF128ng) (SEQ ID NO: 834) from N. gonorrhoeae: 



30 



35 



40 



orf 128 .pep 
orf 128ng 
orf 128 . pep 
orf 128ng 
orf 128 .pep 
orf 128ng 
orf 128 . pep 
orf 128ng 



VS LAS V IASQI FL YEDFNQMRKTVELS AVF 

1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 M I M 

I LS E I QNGS FS FRDFYTRR I KRI YP AF I AAVS LAS V I ASQ I FLYEDFNQMRKT I ELSTVF 



30 



112 



90 



LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLI FCCKKTKSLRVLRNIS I 

llllllh lllllll MINI IMIIIIIIIIMIIMI llllllllllllll 

LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 
ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

1 1 1 1 1 ! 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 ! 1 1 h 1 1 1 1 1 1 1 1 M 1 1 1 1 1 M IN 

ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

Mill IIIIIIMIII IIMI IMIIIIIIIMIIIIIUMIIMIIIIII I 
RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 2 92 
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orf 128 .pep VFVGKI SYSLYLYHW I F I AFAPL I RGGKQLGLPA 

II IMIIIIIIIII MM I I MIMM 

orf 12 8ng VFVGKI SYSLYLYHW I F I AFAHY I TGDKQLGLPAVS AVAALTAGFS LLS YYL I EQPLRKR 

The complete length ORF128ng nucleotide sequence (SEQ ID NO: 833) is: 



1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

3 51 TTTGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

4 01 TACTGCATAT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 
451 CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 
501 GCGTAATATC AGCATCATCC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

" 551 TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

751 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

12 01 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

12 51 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAGAC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

14 01 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

14 51 GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

1751 GACGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

18 01 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence (SEQ ID NO: 834): 



1 MQAVRYRPEI DGLRAVAVLS VIIFHLNNRW LPGGFLGVDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYPA FI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SI ILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 

251 IDKHDPF IPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMTFKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 

4 01 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV EIHGRYLYGD QDHLTYFGSY 
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601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng (SEQ ID NO: 834) and ORF128-1 (SEQ ID NO: 830) show 95.7% identity in 622 aa 
overlap: 



10 



15 



20 



25 



30 



35 



orf 128-1 .pep 
orfl28ng 
orf 128-1 .pep 
orf 128ng 

orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 
orf 128-1 .pep 
orf 128ng 



MQAVR YRPE I DGLRAVAVLS VM I FHLNNRWLPGGFLGVD IFFVISGFLITGIILSEI QNG 

Illllllllllllll III 1 1 hi MM II II I II MM MM I'll II II hll II II II I 

MQAVRYRPE I DGLRAVAVLS VI I FHLNNRWLPGGFLGVD I FFVI SGFLITNI I LSE IQNG 
SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

II II I II I M I II I II II I II II I II II I II 1 1 M II I II II II h 1 1 h M 1 1 1 M M I 

SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 

QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

: I I I I I I II I II I I I I I I I M I I I I II I II II I II I IMIIMIIIIIM Mill 
RLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

II 1 1 h 1 1 1 1 1 II II M II 1 1 II I II 1 1 1 h M 1 1 II 1 1 1 II II II 1 1 1 1 1 M I II M 

SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 
FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

1 1 1 1 M 1 1 1 1 MM 1 1 1 M 1 1 li M 1 1 1 1 i I M 1 1 1 M M I M 1 1 U . 1 . 1 1 1 1 T 

FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLS YYLI EQPLRKRKMTFKKAF 

1 1 1 1 1 1 1 1 II M I II 1 1 1 II 1 1 1 1 1 1 II I M 1 1 1 II II 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMTFKKAF 
FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

llllllllhllllllhlllllllllllllhhMlhlllllllllllllMIIIII 

FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 
DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

I I I I : 1 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 ! 1 1 1 1 1 1 1 1 

DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

1 1 1 1 1 M I II 1 1 M I IMIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIMI MM 

PVPRFEAQSFLIPGFKARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAINQYL 
RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

I I h II II 1 1 1 M 1 1 1 1 1 h M I M II 1 1 1 1 1 1 II 1 1 II 1 1 h 1 1 1 II 1 1 II 1 1 1 1 1 1 II 

RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 
YMGREFHKHERLLKSSHGGALQX 

Ml II MM II III hll IN 

YMGREFHKHERLLKHSRGGALQX 
610 620 



40 In addition, ORF128ng (SEQ ID NO: 834) shows homology to a hypothetical Kinfluenzae protein 
(SEQ ID NO: 1164): 



sp|P43 993 |Y3 92_HAEIN HYPOTHETICAL PROTEIN HI0392 ) gi | 1074385 | pir | | B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
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)gi | 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 



Query: 


38 


VDIFFVISGFLITNI ILSEIQNGSFSFRDFYTRRIKi<I^ 


Q 1 

y / 






+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 




Sbjct : 


1 


MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 


60 


Query: 


98 


DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 


157 






DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 




Sbjct: 


61 


DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 


120 


Query: 


158 


YKKTKS LRVLRN ISIILFLI LTAS S FLPAGFYTD I LNQ PNT Y YLS TLRF PELL VGS LLAV 


217 






YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 




Sbjct: 


121 


YKKFREVKVLFI ITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 


180 


Query : 


218 


YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVIDKHDPFIPGIT 262 








Y N + Q +L++L L CLF+ + + + FIPGIT 




Sbjct: 


181 


YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 





This analysis, including the identification of several putative transmembrane domains, suggests 
that these proteins from ^meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 99 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 835): 



1 . . ATTATTTACG AATACCGCTG 

51 GGGGCTGACG GTCGTGGCAA 

101 TGGCGTTGGC GCGCCTGATT 

151 GTGCTGGCGT GGGCGTTGCG 

201 CCGGGGTACG CCGCTGTTTG 

2 51 TTCCGTTTTT CGTC. . 



GATGTTTCTT TACGGCGCAC TGACGACCTT 
C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 
CACTTGGAAA AAGCCGGTGC GCCGATGCGC 
TAAAGTTTCG CTGCTGTATG TTACGCTGTT 
TGCAGATTGT GATTTGGGCG TATGTGTGGT 



This corresponds to the amino acid sequence (SEQ ID NO: 836; ORF129): 



1 . . IIYEYRWMFL YGALTTLGLT WAXAGGSVL GLLLALARLI HLEKAGAPMR 
51 VLAWALRKVS LLYVTLFRGT PLFVQIVIWA YVWFPFFV. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 837): 



1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

3 01 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 
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451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

5 651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 838; ORF129-1): 

10 1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFVH PSDGI 
101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 
151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 
201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

15 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF129 (SEQ ID NO: 836) shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) 
(SEQ ID NO: 840) from strain A of N. meningitidis: 

20 10 20 30 40 50 

orf 129 .pep IIYEYRWMFLYGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 

Mill Mill II III IMIIII hill II II III II II Mill MM Mill I 

O r f 1 2 9 a MDFRFD 1 1 YE YRWM FLYGALTTLGLT WATAGGSVLGLLLALAR L I HLEKAGAPMRVLAW 

10 20 30 40 50 60 

25 60 70 80 

orf 12 9 . pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 

1 1 1 1 1 1 1 1 1 Ml 1 1 1 1 1 1 1 M 1 1 1 1 1 M 1 1 1 M 

orf 12 9a ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFVH PSDGILVSGEAAIALRRGYGP LIAG 

70 80 90 100 110 120 

30. orf 12 9a SLALIANSGAYIC EIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

130 140 150 160 170 180 

The complete length ORF129a nucleotide sequence (SEQ ID NO: 839) is: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

35 51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCGACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

40 3 01 TTGGTTAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

45 551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 
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This encodes a protein having amino acid sequence (SEQ ID NO: 840): 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFVH PSDGI 

5 101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA Y IC E I FRAG I QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a (SEQ ID NO: 840) and ORF129-1 (SEQ ID NO: 838) show 100.0% identity in 248 aa 
10 overlap: 

orf 12 9a. pep MDFRFDI I YEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

■ or f 1 2 9 - 1 MDFRFD 1 1 YE YRWMFLYGALTTLGLTWATAGGS VLGLLLALARL IHLEKAGAPMRVLAW 

orf 129a . pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

15 1 1 1 1 1 1 1 1 1 ' 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 II ' I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 129-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129a. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
M I I I I I I I I II I I I I I I II I M I I I I I I M I I I I II I I I I I I I I I I I I I II I I I I I 
orf 129-1 SLALI ANSGAY I CE I FRAGIQS IDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

20 orf 12 9a .pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

I I I II M I M I I I i I I I I I M I I I I I I I I I I I I I I II I i I I I I I I I I I I I I I I I I 
orf 129-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

orf 12 9a. pep KRYNPQHRX 
Illllllll 

25 orf 129-1 KRYNPQHRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF129 (SEQ ID NO: 836) shows 98.9% identity over a 88 aa overlap with a predicted ORF 
(ORF1 29ng) (SEQ ID NO: 842) from N. gonorrhoeae: 

orf 12 9 . pep 1 1 YE YRWMFLYGALTTLGLTWAXAGGS VLGLLLALARL IHLEKAGAPMRVLAW 54 

30 | | | | | || || | | | | | | | | | | | | | | : | | | | | | | | || | | | | | | | | | | | | | | | | | | | | 

orf 1 2 9ng MDFRFDI I YE YRWMFLYGALTTLGLTWATAGGS VLGLLLALARL IHLEKAGAPMRVLAW 6 0 



35 



orf 129. pep ALRKVS LL Y VTLFRGT P L F VQ I V I WA YVW F P F F V 88 

II 1 1 II I Mill Mill II Mill INI II MM 

orf 129ng ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 12 0 

An ORF129ng nucleotide sequence (SEQ ID NO: 841) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 842): 

1 MDFRFDI I YE YRWMFLYGAL TTLGLT WAT - AGGSVLGLLL ALA RLIHLEK 
51 AGAPMRVLAW ALRKVSLLYV TLFRGTPLF V QIVIWAYVWF PFFVIL HTAF 
40 101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 



CHIR-0160 (356.001) 



-584- 



PATENT 



151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence (SEQ ID NO: 843): 

1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

4 51 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence (SEQ ID NO: 844; ORF129ng-l): 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALAR LIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFVH PSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l (SEQ ID NO: 844) and ORF129-1 (SEQ ID NO: 838) show 99.2% identity in 248 aa 
overlap: 



or f 12 9-1. pep MDFRFDI I YE YRWMFLYGALTTLGLTVVATAGGS VLGLLLALARL I HLEKAGAPMRVLAW 

I I M I I I I I I I I I I I I I I I II I I I I I II I I I I I i I I I I I I I I I I I I I I I I I I I M I I I I I 
orf 12 9ng-l MDFRFDI IYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9-1 .pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

IIIIIMIIIIIIIIIIIIMIIIIIIIMIIMIIMIIIIIIIIIIIIMIIIIIIII 

orf 12 9ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
orf 12 9-1 .pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

II II I MINIM I MINIM MINIMI 1 1 1 1 1 1 II 1 1 II I II 1 1 1 1 II 1 1 1 II II 

orf 129ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 
or f 1 2 9 - 1 . pep EFI TLLKDS SLLS VI AVAELAYVQNT I TGRYS VYEEPLYTVAL I YLLMTTFLGW I FLRLE 

lllill lllllil I I IIIIUIMIIMIIIMMIIIIIIIIIIIIIIIII 

orf 12 9ng-l EFITLLKDSSLLSVI AVAE LA YVQNTITGRYSVYEEPLYTAALIYLLMTT FLGWIFLRLE 

orf 129-1. pep KRYNPQHRX 

Mill I 
orf 129ng-l KRYNPQHRX 

In addition, ORF129ng-l (SEQ ID NO: 844) is homologous to an ABC transporter (SEQ ID NO: 
1 165) from A.fulgidus: 
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2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) [Archaeoglobus 
fulgidus] Length = 224 
Score = 132 bits (329) , Expect = 2e-30 

Identities = 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 



Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 

+S YV + RGTPL VQI + I +F P+ GI + E A G +AL 

Sbjct: 58 I S TAYVE V I RGT P L L VQ I L I VYFGLPAIGINLQPEPA GIIAL 99 

Query: 125 I ANSGAY I CE I FRAG I QS I DKGQMEAACSLGLT YPQAMRYVI LPQALRRMLPPLAS E F I T 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 S ICSGAYIAEI VRAGI ES I PIGQMEAARSLGMTYLQAMRYVI FPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 100 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 845): 



1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcTAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

4 01 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

4 51 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG CGTTTACAGA CGATCCGGAr Tar 



This corresponds to the amino acid sequence (SEQ ID NO: 846; ORF130): 



1 . . LKECRLKDPV FIPNIVYKNI 

51 LLAKLRELHH HELLRKHYVR 

101 HLITLGGMMG GVMMVWLTAG 

151 FLXNVNPXFF ITVPAILTAA 



AITFLLLHAA AELWLPAQTA GFTALAVGFI 
TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 
LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 
VFVLYLFXFI P I FRANAFTD DPE* 



Further work revealed the complete nucleotide sequence (SEQ ID NO: 847): 



1 ATGCGGCGGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 
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251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

5 451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

10 701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 

901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 

15 951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence (SEQ ID NO: 848; ORF130-1): 



20 1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAAL 

51 LDWTGFSGNL KPVATLMAAL LLAASAILPF SPQTASFFVA AYWLVLLLFC 
101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 
151 FVSVRVSILL GA EALKECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 
201 AQ TAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

25 251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LTA GLWHSGF TKLDYPKLCR . 

301 IAVPILFAAA VSRAFLMNW PIFFITVPAI LTAAVFVLYL FTFIPIFRAN 
3 51 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 

30 Homology with a predicted ORF from N.meninsitidis (strain A) 

ORF130 (SEQ ID NO: 846) shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) 
(SEQ ID NO: 850) from strain A of N. meningitidis: 

10 20 30 

orf 130 .pep LKECRLKDPVF I PNIVYKNI AITFLLLHAA 

35 ' | | | | | | | | M | | | | : | | | | | | | M | | | | M 

or f 1 3 0a LNLLRAQVHLNMAAVNFVSVRVS I LLGAEALKECRLKDPVF I PNWYKNI AITFLLLHAA 

140 150 160 170 180 190 

40 50 60 70 80 90 

orf 130 .pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 
40 | | | | | | | | | | | | | : | | | | | | | | | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 

orf 13 0a AELWLPAQTAGFTS LAVGF I LLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 

200 210 220 230 240 250 

100 110 120 130 140 150 

orf 130 .pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
45 | | | | | | | | | | | | | | | | | | | | : | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 

orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 
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orf 130 .pep 



160 170 180 190 

FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 



orf 130a 




320 330 340 350 



10 
15 
20 
25 
30 



complete length ORF130a nucleotide sequence (SEQ ID NO: 849) is: 



1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG. TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCAG TATTCATCCC CAATGTCGTC TATAAAAACA 

551 TCGCCATTAC CTTCCTGCTC CTGCACGCCG CCGCCGAACT TTGGCTGCCT 

601 GCGCAAACCG CCGGTTTTAC CTCGCTCGCC GTCGGCTTTA TCCTGCTTGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CCTGCGCAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGT GGCATGATGG GCAGCGTGAT GATGGTGTGG CTGACTGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAGCTCG ACTACCCGAA ACTCTGCCGC 

901 ATCGCCGTCC CCATCCTNTT CGCCGCCGCC GTTTCGCGCG CTGTTTTAAT 

951 GAACGTAAAC CCGATATTCT TCATCACCGT CCCCGCAATT CTGACCGCCG 

1001 CCGTGTTCGT GCTTTACCTG CTGACATTCG TACCGATCTT TCGGGCGAAC 

1051 GCGTTTACAG ACGATCCGGA ATAA 

encodes a protein having amino acid sequence (SEQ ID NO: 850): 



1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAAL 

51 LDWTGFSGNL KPVATLMAAL LLAASAILPF SPQTASFFVA AYWLVLLLFC 

101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 AQ TAGFTSLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LTA GLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLMNVN PIFFITVPAI LTAAVFVLYL LTFVPIFRAN 

351 AFTDDPE* 



(SEQ ID NO: 850) and ORF130-1 (SEQ ID NO: 848) show 98.3% identity in 357 aa 



overlap: 



orf 130a . pep 



MRP FFVGAAVLA I LGALVFF INPGA I VLHRQ I FLELMLPAAYGGFLTAALLDWTGFSGNL 



orf 130-1 




orf 130a .pep 



KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 



orf 130-1 




50 



orf 130a .pep 
orf 130-1 



AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVS ILLGAEALKECRLKDPVFI PNW 

Mill MM Ml MMMMMMMMMMMMMMMMMMMMMMMM 

AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVS I LLGAEALKECRLKDPVF I PNIV 
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orf 130a . pep YKNIAITFLLLHAAAELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

M II 1 1 II I II 1 1 II II 1 1 II I III M II II 1 1 1 II II 1 1 1 II I II I II I II II III 1 1 

orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
orf 130a. pep LFAAAGYLWTGAAKiQNLPASAPLHLITLGGMMGSVMMWLTAGLVfHSGFTKLDYPKLCR 

II 1 1 1 III MM lllll I III Mil Mill II hllll II I Mill III llllll II I 

orf 130-1 LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
or f 1 3 0 a . pep I AVP I LFAAAVSRAVLMNVNP I FF I TVPAI LTAAVFVLYLLTFVP I FRANAFTDDPE 

llllllllllllll I M II II 1 1 II 1 1 1 1 1 II M I M M MM II II II II II I 

orf 130-1 I AVP I LFAAAVSRAFLMNVNP I FF I TVPAI LTAAVFVLYLFTF I P I FRANAFTDDPE 

Homology with a predicted ORF from N. gonorrhoeae 

ORF130 (SEQ ID NO: 846) shows 91.7% identity over a 193 aa overlap with a predicted ORF 
(ORF130ng) (SEQ ID NO: 852) from N. gonorrhoeae: 



orf 130 .pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 

1 1 1 II II I M I II MM 1 1 1 1 1 llllll 

orf 13 0ng LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFI PNVI YKNI AIT- LLLHAA 201 

orf 130 .pep AELWLPAQTAGFTALAVGF I LLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

I I II II 1 1 1 1 1 1 M II 1 1 II II II II II II I M II II II II II 1 1 1 1 M I llllll 

or f 1 3 0 ng AELWLPAQTAGFTALAVGF I LLAKLRELHHHELLRKHYVRT YYLLQLFAAAGYLWTGAAK 261 

orf 13 0 . pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 

IMIIIIMIIIIIIMI IMIMI MM IMIMIII II II II II lllhlllll 

orf 13 0ng LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

orf 130 .pep FLXNVNPXF F I TVP A I LTAAVFVL YL FX F I P I FRANAFTDDP E 193 

I 1 1 1 1 llllll II II II M I MM M 1 1 II II II II I 

orf 13 0ng VLMNVNPI FF I TVPE I LTAAVFMLYLLTFVP I FRANAFTDDPE 364 

An ORF130ng nucleotide sequence (SEQ ID NO: 851) was predicted to encode a protein having 
amino acid sequence (SEQ ID NO: 852): 

1 MNKFFTHPMR PFFVGA AVLA ILGALVFFHQ PRRYHPAPPN FLGTYAAGC I 

51 RRFFDYRFVG PDGFFRQPET CRYFDG GWA CCGCFIAVFT ATC RIFRRRL 

101 LAGVAAVLRL ADLARRQHRT LRSVDVTAAF TVFQTAYAVS GDLNLLRAQV 

151 HLNMAAVMFV SVRVSVLLGT ETLKECRLKD PVFIPNVIYK NIAITLLLHA 

2 01 AAELWLPAQ T AGFTALAVGF ILLAKL RELH HHELLRKHYV RTYYLLQLFA 

251 AAGYLWTGAA KLQNLPASAP LHLITLGGMT GGVMMVWLTA GLWHSGFTKL 

301 DYPKLCRIAV SILFASAVSR AVLMNVNPIF FITVPEILTA AVFMLYLLTF 

351 VP I FRANAFT DDPE* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 853): 



1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

2 01 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 
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10 



15 



301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



GCCTGGCTGA 
GTTACTTGCC 
ATTTGAACTT 
TTCGTATCCG 
ATGCCGTCTG 
TCGCCATCAC 
CAAACCGCCG 
GCTGCGCGAA 
CTTATTACCT 
GCGGCGAAAC 
CCTCGGCGGC 
TGTGGCACAG 
GCCGTCTCCA 
CGTGAATCCG 
TGTTCATGCT 
TTTACAGACG 



TTTGGCTCGA 
GCATTTACCG 
ACTGCGCGCG 
TCCGCGTCAG 
AAAGACCCCG 
CCTGCTGCTG 
GTTTTACTGC 
CTGCACCATC 
GCTCCAGCTC 
TGCAAAACCT 
ATGACGGGTG 
CGGCTTTACC 
TCCTTTTCGC 
ATATTCTTCA 
TTACCTGCTG 
ATCCGGAATA 



CCGCAACACC 
TTTTTCAGAC 
CAAGTGCATT 
CGTCCTTTTG 
TATTCATCCC 
CACGCCGCCG 
GCTTGCCGTC 
ACGAACTCTT 
TTTGCCGCCG 
GCCCGCCTCC 
GCGTGATGAT 
AAACTCGACT 
CTCCGCCGTT 
TCACCGTTCC 
ACGTTCGTAC 
A 



GACAACTTCG 
GGCCTATGCC 
TGAATATGGC 
GGCACGGAAA 
CAACGTTATC 
CCGAACTTTG 
GGCTTCATCC 
ACGCAAACAC 
CAGGTTATCT 
GCGCCCCTGC 
GGTGTGGCTG 
ACCCGAAACT 
TCGCGCGCTG 
CGAGATTCTG 
CGATTTTTCG 



CTCTGTTGAT 
GTCAGCGGCG 
GGCGGTCATG 
CCCTGAAAGA 
TATAAAAACA 
GCTGCCCGCG 
TGCTCGCCAA 
TACGTCCGCA 
GTGGACAGGC 
ACCTGATTAC 
ACTGCCGGAC 
CTGCCGCATC 
TTTTAATGAA 
ACCGCCGCCG 
AGCGAACGCG 



This corresponds to the amino acid sequence (SEQ ID NO: 854; ORF130ng-l): 



20 



25 



1 MRPFFVGAAV LAI LGALVFF INPGAI I LHR QIFLELMLPA 



KPAATLMAVL LLVAAVLLPF LPQLAAFFVA 
DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA 

GTETLKECRL KDP VFIPNVI YKNIAITLLL 

201 QTAGFTALAV GFILLAKLRE LHHHELLRKH YVRTYYLLQL 



51 LDRTGFSGNL 
101 AWLIWLDRNT 
151 FVSVRVSVLL 



251 AAKLQNLPAS 
301 AVSILFASAV 
3 51 FTDDPE* 



APLHLITLGG MTGGVMMVWL TAGLWHSGFT 
S RAVLMNVNP IFFITVPEIL TAAVFMLYLL 



AYGGFLTTAL 
AYWLVLLLFC 
QVH LNMAAVM 
HAAAELWLPA 
FAAAGYLWTG 
KLDYPKLCRI 
TFVPIFRANA 



ORF130ng-l (SEQ ID NO: 854) and ORF130-1 (SEQ ID NO: 848) show 92.4% identity in 357 aa 
overlap: 



30 



35 



40 



45 



orf 130-1 .pep 
orf 130ng-l 
orf 130-1 .pep 
orf 130ng-l 
orf 130-1 .pep 
orf I30ng-l 
orf 130-1 .pep 
orf 130ng-l 
orf 130-1 .pep 
orf 130ng-l 
orf 130-1 .pep 
orf 130ng-l 



MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

1 1 1 1 1 1 I M 1 1 1 1 1 : 1 1 1 1 1 1 1 Mh 1 1 1 1 1 1 I M! 1 1 1 II 1 1 h 1 1 1 1 lllllll 

MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

|: M! I h I M : h - I I || I : I I I II I I I I I I I I I I llllllllllllllllll 
KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

I I I I I I I I I I ! I I I M I I I II M I I I M I i M I I : I I I I M I I I I I I I I I I I - 
AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

IMIMI 1 1 II 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 i 1 1 1 1 1 1 ! 

YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMWLTAGLWHSGFTKLDYPKLCR 

! I ! 1 1 1 1 1 1 1 ! 1 1 1 3 1 1 1 1 1 ! 1 1 1 ! lllllllllllllllllllllllllll 

LFAAAGYLWTGAAKIjQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 
I AVP I LFAAAVSRAFLMNVNP I FF I TVPAI LTAAVFVLYLFTF I P I FRANAFTDDPEX 

III lllhlllll lllllllllllll llllllhllhlhllllllllllllll 

I AVS I L FAS AVS RAVLMNVNP I F F I TVPE I LTAAVFML YLLT FVP I FRANAFTDD PEX 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 101 

The following partial DNA sequence was identified in N. meningitidis (SEQ ED NO: 855): 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C . TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence (SEQ ID NO: 856; ORF131): 

1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 857): 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 



This corresponds to the amino acid sequence (SEQ ID NO: 858; ORF131-1): 



1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR , 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 



ORF131 (SEQ ID NO: 856) shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) 
(SEQ ID NO: 860) from strain A of N. meningitidis: 



CHIR-0160 (356.001) PATENT 

-591- 

10 20 30 40 50 60 

orf 131 .pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

M 1 1 1 1 1 1 1 1 1 ' 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 h 1 1 Ml 1 1 II 1 1 1 1 1 Ml 1 1 1 1 1 1 : I 

orf 131a MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 131 .pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 

MINIMI I M M 1 1 1 1 1 1 Ml M IM 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 MM 

orf 13 la YE I PLSDGNRS VRANE YES AQQS YFYRKI GKFEACGLDWRTRDGKPL I ETFKQEGFDCLK 

10 70 80 90 100 110 120 

orf 131. pep K 
I 

or f 1 3 1 a KQGLRRNGLS ERVRWX 

15 130 

The complete length ORF131 a nucleotide sequence (SEQ ID NO: 859) is: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 

20 101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

2 01 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

25 351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This encodes a protein having amino acid sequence (SEQ ID NO: 860): 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
30 51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 

101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a (SEQ ID NO: 860) and ORF131-1 (SEQ ID NO: 858) show 97.0% identity in 135 aa 
overlap: 

35 orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 13 la . pep YE I PLSDGNRSVRANEYESAQQSYFYRKI GKFEACGLDWRTRDGKPL I ETFKQEGFDCLK 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 MMM 

40 orf 13 1 - 1 YE I PLSDGNRSVRANEYESAQQSYFYRKI GKFEACGLDWRTRDGKPL I ETFKQGGFDCLE 

orf 13 la . pep KQGLRRNGLSERVRWX 

M MMIMIMM 

or f 1 3 1 - 1 KQGLRRNGLSERVRWX 



Homology with a predicted ORF from N gonorrhoeae 
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ORF131 (SEQ ID NO: 856) shows 89.3% identity over 121 aa overlap with a predicted ORF 
(ORF131ng) (SEQ ID NO: 862) from N. gonorrhoeae: 

orf 131 . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

Mlhlllll llhlllllllllllllll 1 1 : i M 1 1 1 1 M 1 1 1 1 1 1 1 1 Ml II I 

orf 131ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

orf 131 .pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 120 

Illllllll M I I I I II I I : I I I I I I I M II llllllllllllhl III llllll 
or f 13 lng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 120 

orf 131. pep K 121 
I 

orf 13 lng KQGLRRNGLS ERVRW 134 

A complete length ORF131ng nucleotide sequence (SEQ ID NO: 861) was predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 862): 

15 1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 

51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 863): 

20 1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtCCgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

25 2 51 ACTTTTATAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

3 01 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

3 51 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

30 This corresponds to the amino acid sequence (SEQ ID NO: 864; ORF131ng-l): 

1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

35 ORF131ng-l (SEQ ID NO: 864) and ORF131-1 (SEQ ID NO: 858) show 92.6% identity in 135 aa 
overlap: 

orf 13 lng- 1 .pep MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 

Mlhlllll I hi 1 1 1 1 M 1 1 1 1 1 1 M 1 1 h 1 1 1 1 1 1 1 1 1 M 1 1 II 1 1 1 1 1 II I 

orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

40 orf 13 lng- 1 .pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 

I I I I I I I I I I I I I I I I I Ihl I I I II ' M I I I I I I I I I I I I I I I h Ml llllll 
or f 1 3 1 - 1 YE I PLSDGNRS VRANE YESAQQSYFYRKIGKFEACGLDWRTRDGKPLI ETFKQGGFDCLE 
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orf 131ng-l .pep KQGLRRNGLS ERVRWX 

I I I I I II I I I I I I I I 
orf 13 1 - 1 KQGLRRNGLS ERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 102 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 865) 

1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

2 01 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

2 51 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

3 01 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACgC ACGGCAAAAC 
351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

4 01 CGGGCTTCCT TATtGGCGGC GTACC . GGAA AATttCGGCG TTTCCGCCCG 
4 51 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 
501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 
551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 
601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 
651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 
701 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 
751 AAAATTCGGC ACGGAACACG GCTGGCA . . 

This corresponds to the amino acid sequence (SEQ ID NO: 866; ORF132): 



1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence (SEQ ID NO: 867): 



1 ATGAAACACA TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

4 01 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

4 51 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTTGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACTA 



CHIR-0160 (356.001) 



-594- 



PATENT 



651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCTT AATCGTCTGC AACGGACGGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

751 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGACGG 

801 CTCGTTCGAC GTGTTGCTCG ACGGCAAAAC CGCCGGACGC GTCAAATGGG 

851 ATTTGATGGG CAGGCACAAC CGCATGAACG CGCTCGCCGT CATTGCCGCC 

901 GCGCGTCATG TCGGTGTCGA TATTCAGACC GCCTGCGAAG CCTTGGGCGC 

951 GTTTAAAAAC GTCAAACGCC GGATGGAAAT CAAAGGCACG GCAAACGGCA 

1001 TCACCGTTTA CGACGACTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAACG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAACACGA TGAAGCTGGG CACGATGAAG TCCGCCCTGC 

1151 CTGTAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGTG 

1201 GACTGGGACG TCGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGAACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 TAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 GGAAAGCTGC TGGAAGCTTT GAGATAG 

This corresponds to the amino acid sequence (SEQ ID NO: 868; ORF132-1): 

1 MKHIHIIGIG GTFMGGLAAI A KEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHYLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

251 KFGTEHGWQA GEANADGSFD VLLDGKTAGR VKWDLMGRHN RMNALAVIAA 

301 ARHVGVDIQT ACEALGAFKN VKRRME I KGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPVSLKEA DQVFCYAGGV 

401 DWDVAEALAP LGGRLNVGKD FDAFVAEIVK NAEVGDHILV MSNGGFGGIH 

451 GKLLEALR* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical o457 protein (SEQ ID NO: 1166) of E.coli (accession number 
U14003) 

ORF132 (SEQ ID NO: 866) and o457 (SEQ ID NO: 1 166) show 58% aa identity in 140 aa overlap: 



0rfl32: 


4 


IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 


63 






IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 




0457: 


3 


IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 


61 


Orfl32: 


64 


ADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTASML 


123 






D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 




0457 : 


62 


PDLVI I GNAMTRGN PCVEAVLE KN I PYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 


121 


Orfl32: 


124 


AWVLE YAGLAPGFL I GGVXG 143 








W+LE G PGF+IGGV G 




0457: 


122 


TW I LEQCGYKPGFVI GGVPG 141 





Homology with a predicted ORF from N. meningitidis (strain A) 

ORF132 (SEQ ID NO: 866) shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) 
(SEQ ID NO: 870) from strain A of N. meningitidis: 
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10 20 30 . 40 50 60 

orf 132 . pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

I M 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 ! 1 1 1 1 1 IIIIIIIIMIIIIIIIIII lllllhllll 

orf 132a MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

10 20 30 40 . 50 60 

70 80 90 100 110 120 

orf 132 . pep EFKADVYVIGNVAKRGMDWEAII^LGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

MINIMUM] Mill I Mill II II II II llh II Mill 1 1 1 1 I MM II I 

orf 132a EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 

70 80 90 100 110 120 

130 140 150 160 
orfl32.pep SMLAWVLE YAGLAPGFL I GGVXGKFR RFRP PAANAAPRPEQP I AVFR 

M I I I I II II I I I I MM : I Ml: I ::h I I 

or f 1 3 2 a SMLAWVLEYAGLAPGFX I GGVPENFS VS ARL - PQTPRQDPNSQS PFFVI EADE YDTAFFD 

130 140 150 160 170 

170 180 190 200 210 220 

orf 132 . pep HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 
:||: :::| 

orf 132a KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 



The complete length ORF132anucleotide sequence (SEQ ID NO: 869) is: 



1 


ATGAAACACA 


TCCACATTAT 


51 


TGCCGCCATT 


GCCAAAGAAG 


101 


AGATGTATCC 


GCCGATGAGC 


151 


TATGAAGGCT 


TCGACACCGC 


201 


CGTTATCGGC 


AATGTCGCCA 


251 


TGAACCGTGG 


GCTGCCTTAT 


301 


NTGCTGCACC 


ATCATTGGNN 


351 


GACCACCGCG 


TCTATGCTCG 


401 


CGGGCTTCNT 


TATCGGCGGC 


451 


CTGCCGCAAA 


CGCCGCGCCA 


501 


CATTGAAGCC 


GACGAATACG 


551 


TCGTGCATTA 


CCGTCCGCGT 


601 


CACGCCGACA 


TCTTCGCCGA 


651 


CCTCGTGCGT 


ACCGTGCCGT 


701 


AGCAAAGCCT 


GCAAGACACT 


751 


AAATTCGGCA 


CGGAACACGG 


801 


CTCGTTCGAC 


GTGTTGCTTG 


851 


GTTTGATGGG 


CGGACACAAC 


901 


GCGCGTCATG 


CCGGAGTNGA 


951 


GTTTAAAAAC 


GTCAAACGCC 


1001 


TCACCGTTTA 


CGACGACTTC 


1051 


ATTCAAGGTT 


TGCGCCAGCG 


1101 


CGAACCGCGT 


TCCAATACGA 


1151 


CCGCAAGCCT 


CAAAGAAGCC 


1201 


GACTGGGACG 


TTGCCGAAGC 


1251 


CGGCAAAGAC 


TTCGATGCCT 


1301 


CAGGCGACCA 


TATTTTGGTG 


1351 


ACCAAACTGC 


TGGACGCTTT 



CGGTATCGGC GGCACGTTTA TGGGTGGGAT 
CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 
ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 
GCAGTTGGAC GAATTTAAAG CCGACGTTTA 
AGCGCGGGAT GGATGTGGTT GAAGCGATTT 
ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 
ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 
CGTGGGTTTT GGAATATGCC GGACTCGCAC 
GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 
AGACCCGAAC AGCCAATCGC CGTTTTTCGT 
ACACCGCGTT TTTCGACAAA CGCTCCAAAT 
ACCGCCGTGT TGAACAATCT GGAATTCGAC 
TTTGGGCGCG ATACAGACCC AGTTCCACCA 
CTGAAGGCCT CATCGTCTGC AACGGACGGC 
TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 
CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 
ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 
CGCATGAACG CGCTCGCNGT CATCGCCGCC 
CATTCAGACG GCCTGCGAAG CCTTGAGCAC 
GCATGGAAAT CAAAGGCACG GCAAACGGTA 
GCCCACCATC CGACCGCTAT CGAAACCACG 
CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 
TGAAGCTGGG TACGATGAAA GCCGCCCTGC 
GACCAAGTGT TCTGNTACGC CGGCGGCGCG 
CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 
TCGTTGCCGA AATCGTGAAA AACGCCGAAG 
ATGAGCAACG GCGGTTTCGG CGGAATACAC 
GAGATAG 



This encodes a protein having amino acid sequence (SEQ ID NO: 870): 



1 MKHIHIIGIG GTFMGGIAAI AKEAGFEXSG CDAKMYPPMS TQLEALGIGV 
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51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQTPRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 

5 251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAV I AA 

301 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

4 01 DWDVAEALAP LGGRLHVGKD FDAFVAEIVK NAEAGDHILV MSNGGFGGIH 

4 51 TKLLDALR* 

10 

ORF132a (SEQ ID NO: 870) and ORF132-1 (SEQ ID NO: 868) show 93.9% identity in 458 aa 
overlap: 



orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 

I I I I I I I I H I I I I : i I I I I I M II llllllll IMMIMII MlllhllM 
15 orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132a . pep EFKADVYV I GNVAKRGMDWEAI LNRGLPY I SGPQWLAENXLHHHWXLGVAXTHGKTTTA 

Illllllllllllllllllllllll llllllllllhll Mill I I I I MINIM 
orf 132 - 1 E FKAD VYV I GNVAKRGMD WEA I LNLGL P Y I S GPQWLS ENVLHHHWVLGVAGTHGKTTTA 

orf 132a. pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

20 II MUM III MM I llllll Ihl I Mill II Illllllllllllllllllllllll 

orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
orf 132a. pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 

I 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 II I II 1 1 II h 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 

orf 132 - 1 RS KFVHYRPRTAVLNNLE FDHAD I FADLGAIQTQFHYLVRTVPS EGL I VCNGRQQSLQDT 

25 orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 

II II I I II I I II M I I I II I I I I I I I I I I II I II II MM hill llllll II MM 
or f 1 3 2 - 1 LDKGCWTPVEKFGTEHGWQAGEANADGS FDVLLDGKTAGRVKWDLMGRHNRMNALAVI AA 

orf 132a . pep ARHAGVD I QTACEALSTFKNVKRRME I KGTANGI TVYDDFAHHPTAI ETT IQGLRQRVGG 
I I : I I I I I I I I I I h: I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I II I I I 
30 orf 132 - 1 ARHVGVD I QTACEALGAFKNVKRRME I KGTANGI TVYDDFAHHPTAI ETT I QGLRQRVGG 

orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 

I I I I I I I I I M I II I I I I M I M I I I I I I I I I I h I I I I I I I I I I I I I h IN I 
orf 132 - 1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a .pep FDAFVAE I VKNAEAGDH I LVMSNGGFGG I HT KLLDALRX 

35 | | | || I I II I I I h I I I II I II II I I I II I II h I I I I 

orf 132 - 1 FDAFVAE I VKNAEVGDH I LVMSNGGFGG I HGKLLEALRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF132 (SEQ ID NO: 866) shows 89.6% identity over 259 aa overlap with a predicted ORF 
(ORF132ng) (SEQ ID NO: 872) from N. gonorrhoeae: 

40 orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

1 1 II 1 1 1 1 1 II 1 1 1 1 h I M 1 1 1 1 1 h 1 1 1 1 1 1 1 M II II 1 1 1 1 II II Ml III I Ih • 

orf 132ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 60 
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orfl32.pep EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 120 

! I : M : II M I II : I I I I M I I I M I I I I I I I I I I I : I I I I II I I I I I I I I I I I I I I I I 
orfl32ng E FQAD I YV I GNVARRGMDWEAI LNRGL P Y I SGPQWLAENVLHHHWVLGVAGTHGKTTTA 120 

orf 132 .pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 180 

5 I II II I II I III I III 1 1 II I IMIIIIIMII MM Mill II II I II 1 1 III 

or f 1 3 2 ng SMLAWVLE YAGLAPGFL I GGVPGKFRRFRP PTANAAS RPEQQ I AVFRHRS RR I RHRLFRQ 180 

orf 132 .pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 24 0 

h I I I I I M I I I M I I I I I I II llllllllll I I I : I = = I :||IIIIIIIMI 
o r f 1 3 2 ng TLQ I RALS PAYRRVEQSG I RPRRHLRRLGRDTDP VP P PRAHRT I RRPHRLQRTAAKPARY 240 

10 orf 132. pep FGQRLLDAGGKIRHGTRLA 259 

I I I I I I I I I I I I I | | M 
orfl32ng FGQRLLDAGGKIRHRTRLADW 261 

An ORF132ng nucleotide sequence (SEQ ID NO: 871) was predicted to encode a protein having 
15 amino acid sequence (SEQ ID NO: 872): 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 

151 PTANAASRPE QQ I AVFRHRS RR I RHRLFRQ TLQ I RALS PA YRRVEQSGIR 

20 201 PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 

2 51 KIRHRTRLAD W* 

Further work revealed the following gonococcal DNA sequence (SEQ ID NO: 873): 

1 ATGAAACACA TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGAT 

25 51 TGCCGCCATT GCCAAAGAAG CCGGGTTCAA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTA 

151 CACGAAGGCT TCGATGCCGC GCAGTTGGAA GAATTTCAAG CCGATATTTA 

2 01 CGTCATCGGC AATGTCGCCA GGCGCGGGAT GGATGTGGTC GAGGCGATTT 

2 51 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAac 

30 3 01 GTGCtgcacc atcaTTGGgt ACTCGGCGTG GcagggaCGC ACGGcaaAac 

351 gaccaCcGcg tCCATGCTCG CCT-GGGTCTT GGAATATGCC GGACTCGCGC 

4 01 CGGGCTTCCT CATCGGCGGt gtaccggaAA ATTTCGGCGT TTCCGCCCGC 

451 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGCTCCAAAT 

35 551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC AACGGACAGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC GCGGGTGGAA 

751 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA ATGCCGACGG 

40 801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCATGGG 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT CATCGCTGCC 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG CCTTGGGTGC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGCA 

1001- TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT CGAAACCACG 

45 1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG TCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC CGGCGGCGCG 

12 01 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA GGCTGCGCGT 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA AACGCCCGAA 

50 1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 
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This corresponds to the amino acid sequence (SEQ ID NO: 874; ORF132ng-l): 



10 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 



MKHIHIIGIG GTFMGGIAAI 



HEGFDAAQLE 
VLHHHWVLGV 
LPQTPRQDPN 
HAD I FADLGA 
KFGTGHGWQI 
ARHAGVDVQT 
IQGLRQRVGG 
DWDVAEALAP 
TKLLDALR* 



EFQADIYVIG 
AGTHGKTTTA 
SKSPFFVIEA 
IQTQFHHLVR 
GEVNADGSFD 
ACEALGAFKN 
ARILAVLEPR 
LGCRLRVGKD 



AKEAGFKVSG 
NVARRGMDW 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKKAGH 
VKRRME I KGT 
SNTMKLGTMK 
FDTFVAEIVK 



CDAKMYPPMS 
EAILNRGLPY 
GLAPGFLIGG 
RSKFVHYRPR 
NGQQQSLQDT 
VAWDLMGGHN 
ANGITVYDDF 
SALPASLKEA 
NARTGDHILV 



TQLEALGIGV 
ISGPQWLAEN 
VPENFGVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAV I AA 
AHHPTAIETT 
DQVFCYAGGA 
MSNGGFGGIH 



ORF132ng-l (SEQ ID NO: 874) and ORF132-1 (SEQ ID NO: 868) show 93.2% identity in 458 aa 
overlap: 



15 



20 



25 



30 



35 



orf 132ng-l.pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 

IIIIM IIIIIIIMIIIIIIIMIIMIi MINIM III hlllllllh 

orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
orf 13 2ng-l.pep E FQ AD I YV I GNVARRGMD WEA I LNRGL P Y I S G PQWLAENVLHHHWVLGVAGTHGKTTT A 

MMM MMMM MMMM 1 1 1 1 1 1 1 1 1 1 h 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132ng-l .pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 

I I I I II I I I II I I I I I I I I I I I I I II I I I I I I I I II I I I I h I I I I I I I I I I I I I I II I I 
orf 132 - 1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orf 132ng- 1 . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 

I M II 1 1 M 1 1 1 II 1 1 I M 1 1 1 1 1 1 M I M 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I M I 

orf 132 - 1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 
orf 132ng- 1 . pep LDKGCWTPVE KFGTGHGWQ I GEVNADGSFDVLLDGKKAGHVAWDLMGGHNRMNALAV I AA 

Illlllllllllll MM IhlllllllMIIII Ihl Mill 1 1 1 1 1 1 1 1 1 1 1 : 

orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132ng- 1 . pep ARHAGVDVQTACEALGAFKNVKRRME I KGTANG I TVYDDFAHHPTA I ETT IQGLRQRVGG 

I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I E I I I I I I I I I I I I I 
orf 132 - 1 ARHVGVD I QTACEALGAFKNVKRRME I KGTANG I TVYDDFAHHPTA I ETT IQGLRQRVGG 

orf 132ng- 1 . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 

1 1 1 II 1 1 1 II 1 1 II 1 1 1 II 1 1 1 1 h 1 1 1 1 II 1 1 1 1 1 1 1 h 1 1 1 II 1 1 1 1 II I M MM 

orf 13 2-1 ARI LAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 



orf 132ng- 1 . pep FDTFVAEIVKNARTGDHILVMSNGGFGGIHTKLLDALRX 

I h I I I II I I I h : I I I I I I II I I I I I I I I I I h I I I I 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 



40 In addition, ORF132ng-l (SEQ ID NO: 874) is homologous to a hypothetical Exoli protein (SEQ 
ID NO: 1166): 



pir||S56459 hypothetical protein o457 - Escherichia coli )gi|537075 (U14003) 
ORF_o457 [Escherichia coli] )gi| 1790680 (AE000494) hypothetical 48.5 JcD protein in 
fbp-pmba intergenic region [Escherichia coli] Length = 457 
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Score = 474 bits (1207), Expect = e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 (2%) 



++ Q +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 



A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 



10 P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 



15 



20 



Query : 


22 


Sbjct : 


21 


Query : 


82 


Sbjct : 


80 


Que ry : 


142 


Sbjct : 


140 


Query : 


202 


Sbjct : 


191 


Query : 


262 


Sbjct : 


251 


Query : 


321 


Sbjct : 


311 


Query: 


380 


Sbjct : 


371 


Query: 


439 


Sbjct: 


431 



ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 



++ D S ++VLLDG+K G V W L+G HN N L IAAARH GV A ALG+F N 



+RR+E++G ANG+TVYDDFAHHPTAI T+ LR +VGG ARI+AVLEPRSNTMK+G 



K L SL AD+VF W VAE D DT +VK A+ GDHI 



25 LVMSNGGFGGIH KLLD L 

Sbjct: 431 LVMSNGGFGGIHQKLLDGL 44 9 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

30 ORF132-1 (SEQ ID NO: 868) (26.4kDa) was cloned in pET and pGex vectors and expressed in 
Exoli, as described above. The products of protein expression and purification were analyzed by 
SDS-PAGE. Figure 20A shows the results of affinity purification of the His-fusion protein, and 
Figure 20B shows the results of expression of the GST-fusion in E.colL Purified His-fusion protein, 
was used to immunise mice, whose sera were used for FACS analysis (Figure 20C) and ELISA 

35 (positive result). These experiments confirm that ORF132 (SEQ ID NO: 866) is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 103 



The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 875) 
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1 . . CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 

351 AGGATTAAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

451 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CATCAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence (SEQ ID NO: 876; ORF133): 



1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFDPKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRTFLMTMS YKF* 

Further work revealed the further partial DNA sequence (SEQ ID NO: 877): 



1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 

301 TCATCTCAAT TCGGTGCATC TGTCGACAGC AATTTTATTG CCGGACTGGA 

351 TGTCGTCAAA GGCAGCTTCA GCGGCTCGGC AGGCATCAAC AGCCTTGCCG 

4 01 GTTCGGCGAA TCTGCGGACT TTAGGCGTGG ATGACGTCGT TCAGGGCAAT 

451 AATACCTACG GCCTGCTGCT AAAAGGTCTG ACCGGCACCA ATTCAACCAA 

501 AGGTAATGCG ATGGCGGCGA TAGGTGCGCG CAAATGGCTG GAAAGCGGAG 

551 CATCTGTCGG TGTGCTTTAC GGGCACAGCA GGCGCAGCGT GGCGCAAAAT 

601 TACCGCGTGG GCGGCGGCGG GCAGCACATC GGAAATTTTG GCGCGGAATA 

651 TTTGGAACGG CGCAAGCAGC GATATTTTGT ACAAGAGGGT GCTTTGAAAT 

701 TCAATTCCGA CAGCGGAAAA TGGGAGCGGG ATTTACAAAG GCAACAGTGG 

751 AAATACAAGC CGTATAAAAA TTACAACAAC CAAGAACTAC AaAAATACAT 

801 CGAAGAGCAT GACAAAAGCT GGCGGGAAAA CCTg.CaCCG CAATACGACA 

851 TTACCCCCAT CGATCCGTCC AGCCTGAAGC AGCAGTCGGC AGGCAATCTG 

901 TTTAAATTGG AATACGACGG CGTATTCAAT AAATACACGG CGCAATTTCG 

951 CGATTTAAAC ACCAAAATCG GCAGCCGCAA AATCATCAAC CGCAATTATC 

1001 AGTTCAATTA CGGTTTGTCT TTGAACCCGT ATACCAACCT CAATCTGACC 
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1051 GCAGCCTACA ATTCGGGCAG GCAGAAATAT CCGAAAGGGT CGAAGTTTAC 

1101 AGGCTGGGGG CTTTTAAAGG ATTTTGAAAC CTACAACAAC GCGAAAATCC 

1151 TCGACCTCAA CAACACCGCC ACCTTCCGGC TGCCCCGCGA AACCGAGTTG 

1201 CAAACCACTT TGGGCTTCAA TTATTTCCAC AACGAATACG GCAAAAACCG 

1251 CTTTCCTGAA GAATTGGGGC TGTTTTTCGA CGGTCCTGAT CAGGACAACG 

1301 GGCTTTATTC CTATTTGGGG CGGTTTAAGG GCGATAAAGG GCTGCTGCCC 

1351 CAAAAATCAA CCATTGTCCA ACCGGCCGGC AGCCAATATT TCAACACGTT 

14 01 CTACTTCGAT GCCGCGCTCA AAAAAGACAT TTACCGCTTA AACTACAGCA 

14 51 CCAATACCGT CGGCTACCGT TTCGGCGGCG AATATACGGG CTATTACGGC 

1501 TCGGATGACG AATTTAAGCG GGCATTCGGA GAAAACTCGC CGACATACAA 

1551 GAAACATTGC AACCGGAGCT GCGGGATTTA TGAACCCGTA TTGAAAAAAT 

1601 ACGGCAAAAA GCGCGCCAAC AACCATTCGG TCAGCATTAG TGCGGACTTC 

1651 GGCGATTATT TCATGCCGTT CGCCAGCTAT TCGCGCACAC ACCGTATGCC 

1701 CAACATCCAA GAAATGTATT TTTCCCAAAT CGGCGACTCC GGCGTTCACA 

1751 CCGCCTTAAA ACCAGAGCGC GCAAACACTT GGCAATTTGG CTTCAATACC 

1801 TATAAAAAAG GATTGTTAAA ACAAGATGAT ACATTAGGAT TAAAACTGGT 

1851 CGGCTACCGC AGCCGCATCG ACAACTACAT CCACAACGTT TACGGGAAAT 

1901 GGTGGGATTT GAACGGGGAT ATTCCGAGCT GGGTCAGCAG CACCGGGCTT 

1951 GCCTACACCA TCCAACATCG CAATTTCAAA GACAAAGTGC ACAAACACGG 

2001 TTTTGAGTTG GAGCTGAATT ACGATTATGG GCGTTTTTTC ACCAACCTTT 

2051 CTTACGCCTA TCAAAAAAGC ACGCAACCGA CCAACTTCAG CGATGCGAGC 

2101 GAATCGCCCA ACAATGCGTC CAAAGAAGAC CAACTCAAAC AAGGTTATGG 

2151 GTTGAGCAGG GTTTCCGCCC TGCCGCGAGA TTACGGACGT TTGGAAGTCG 

2201 GTACGCGCTG GTTGGGCAAC AAACTGACTT TGGGCGGCGC GATGCGCTAT 

2251 TTCGGCAAGA GCATCCGCGC GACGGCTGAA GAACGCTATA TCGACGGCAC 

23 01 CAACGGGGGA AATACCAGCA ATTTCCGGCA ACTGGGCAAG CGTTCCATCA 
2351 AACAAACCGA AACTCTTGCC CGCCAGCCTT TGATTTTTGA TTTTTACGCC 

24 01 GCTTACGAGC CGAAGAAAAA CCTTATTTTC CGCGCCGAAG TCAAAAATCT 
2451 GTTCGACAGG CGTTATATCG ATCCGCTCGA TGCGGGCAAT GATGCGGCAA 
2501 CGCAGCGTTA TTACAGCTCG TTCGACCCGA AAGACAAGGA CGAAGACGTA 
2551 ACGTGTAATG CTGATAAAAC GTTGTGCAAC GGCAAATACG GCGGCACAAG 
2601 CAAAAGCGTA TTGACCAATT TTGCACGCGG ACGCACCTTT TTGATGACGA 
2651 TGAGCTACAA GTTTTAA 

This corresponds to the amino acid sequence (SEQ ID NO: 878; ORF133-1): 



1 EAQIQVLEDV HVKAKRVPKD KKVFTDARAV STRQDIFKSS ENLDNIVRSI 

51 PGAFTQQDKS SGIVSLNIRG DSGFGRVNTM VDGITQTFYS TSTDAGRAGG 

101 SSQFGASVDS NFIAGLDWK GSFSGSAGIN SLAGSANLRT LGVDDWQGN 

151 NTYGLLLKGL TGTNSTKGNA MAAIGARKWL ESGASVGVLY GHSRRSVAQN 

2 01 YRVGGGGQHI GNFGAEYLER RKQRYFVQEG ALKFNSDSGK WERDLQRQQW 

251 KYKPYKNYNN QELQKYIEEH DKSWRENLXP QYDITPIDPS SLKQQSAGNL 

301 FKLEYDGVFN KYTAQFRDLN TKIGSRKI IN RNYQFNYGLS LNPYTNLNLT 

351 AAYNSGRQKY PKGSKFTGWG LLKDFETYNN AKILDLNNTA TFRLPRETEL 

4 01 QTTLGFNYFH NEYGKNRFPE ELGLFFDGPD QDNGLYSYLG RFKGDKGLLP 

4 51 QKSTIVQPAG SQYFNTFYFD AALKKDIYRL NYSTNTVGYR FGGEYTGYYG 

501 SDDEFKRAFG ENSPTYKKHC NRSCGIYEPV LKKYGKKRAN NHSVSISADF 

551 GDYFMPFASY SRTHRMPNIQ EMYFSQIGDS GVHTALKPER ANTWQFGFNT 

601 YKKGLLKQDD TLGLKLVGYR SRIDNYIHNV YGKWWDLNGD IPSWVSSTGL 

651 AYTIQHRNFK DKVHKHGFEL ELNYDYGRFF TNLSYAYQKS TQPTNFSDAS 

701 ESPNNASKED QLKQGYGLSR VSALPRDYGR LEVGTRWLGN KLTLGGAMRY 

751 FGKSIRATAE ERYIDGTNGG NTSNFRQLGK RSIKQTETLA RQPLIFDFYA 

801 AYEPKKNLIF RAEVKNLFDR RYIDPLDAGN DAATQRYYSS FDPKDKDEDV 

851 TCNADKTLCN GKYGGTSKSV LTNFARGRTF LMTMSYKF* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with with the probable TonB-dependent receptor HI121 of H.influenzae (accession 
number U3280n (SEP ID.NO: 1 167) 

ORF133 (SEQ ID NO: 876) and HI121 (SEQ ID NO: 1167) show 57% aa identity in 363aa 
overlap: 



Orf 133 : 


31 


IYEPVLKKYGKKKANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 


90 






I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNIQEM+FSQ+ ++GV+TA 




HI121 : 


563 


INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 


622 


Orf 133 : 


91 


LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNWGKWTOIJSfGDIPSWV 


150 






LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 




HI121: 


623 


LKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFI KNY IHNVYGVWW - - RDGMPTWA 


680 


Orf 133 : 


151 


SSTGLAYTIQHRXFXDKVHXXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNN 


210 






S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 




HI121 : 


681 


ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 


740 


Orf 133 : 


211 


ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYID 


270 






AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 




HI121: 


741 


ASQED I LKQGYGLSRVSMLP KD YGRLELGTRW FDQKLTLGLAAR Y YGKS KRAT IEEEYIN 


800 


Orf 133 : 


271 


GTNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 


330 






G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 




HI121: 


801 


GSR- FKKNTLRREN Y YAVKKTED I KKQP 1 1 LDLHVSYE P I KDL 1 1 KAEVQNLLDKRYVDP 


859 


Orf 133 : 


331 


LDAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMS 


390 






LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 




HI121: 


860 


LDAGNDAASQRYYS SL NNS I ECAQDS SAC GGSDKTVLYNFARGRTYI LSLN 


910 


Orf 133 : 


391 


YKF 3 93 








YKF 




HI121: 


911 


YKF 913 





Homology with a predicted ORF from N. meningitidis (strain A) 

ORF133 (SEQ ID NO: 876) shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) 
(SEQ ID NO: 880) from strain A of N. meningitidis: 

10 20 30 

orf 133 .pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 

III III 1 I I i I I I I I I MINIM 
orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 

40 50 60 70 80 90 

orf 133 .pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I M I I I I I . I I I I I M I I I I I I II 
orf 133a YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
510 520 530 540 550 560 

100 110 120 130 140 150 
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orf 133 .pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 

iiiiiiiiiiii iiiiiiiiiii iiiiiiiiiiiii Milium iiMiii ii 

orf 133a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
570 580 590 600 610 620 

160 170 180 190 200 210 

orf 133 .pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 

IIIIIIIIIII I 1111= III llllll IIIIIIIMIII llllllllll 

orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 
630 640 650 660 670 680 

220 230 240 250 260 270 

orf 133 .pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 

llllll MM III II MM I llllll MM II IIIIIIIIIIII MM llllllllll I 

orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 
690 700 710 720 730 740 

280 290 300 310 320 330 

orf 133 .pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 

III IIIIIIIIIIII MINIUM I IIMIII 1 1 1 1 1 i I i 1 1 1 E 1 1 1 1 1 i I 

orf 133a TNGXXTSNFRQLGKRS IXQTETLARQPL I FDXYAAYE PKKXL I FRAE VKNLFDRRY I DPL 

750 760 770 780 790 800 



340 350 360 370 380 390 

orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 

1 1 1 1 1 1 M M 1 1 1 1 1 II M II Mill IMIIIIIIIIIIMIIIIMII IIMIMI 

orf 133a DAGNDAATQRY YS S FDPKDKDEEVTCNDDNTLCNGKYGGTS KS VLTNFARGXTFL I TMS Y 

810 820 830 84 0 850 860 



orf 133. pep KFX 
III 

orfl33a KFX 
870 



A partial ORF133a nucleotide sequence (SEQ ID NO: 879) is: 



1 


AAAGACAAAA 


AAGTGTTTAC 


51 


TATATTCAAA 


TCCANCGAAA 


101 


GTGCGTTTAC 


ACANCAANAT 


151 


CGCNGCGACA 


GCGGGTTCGG 


201 


NCANACCTTT 


TATTCGACTT 


251 


CTCAATTCGG 


TGCATCTGTC 


301 


GTCAAAGGCA 


GCTTCAGCGG 


351 


GGCGAATCTG 


CGGACTTTAN 


401 


CNTACGGCCT 


GCTGCTAAAA 


451 


AATGCGATGG 


CGGCGATAGG 


501 


TGTCGGTGTG 


CTTTAGGGGC 


551 


GCGTGGGCGG 


CGGCGGGCAG 


601 


GAACGACGCA 


AGCAACGATA 


651 


TTCCAACAGC 


GGAAAATGGG 


701 


CCAAGTGGTA 


TCAAAAATAC 


751 


GAAGGTCATG 


ATAAAAGCTG 


801 


CACCCCCATC 


GATCCGTCCA 


851 


TTAAATTGGA 


ATACGACGGC 


901 


GATTTAAACA 


CCAAAATCGG 


951 


ATTCAATTAC 


GGTTTGTCTT 


1001 


CAGCCTACAA 


TTCGGGCAGG 


1051 


GGCTGGGGGC 


TTTTNAAAGA 



CGATGCGCGT GCCGTATCGA CCCGTCAGGA 
ACCTCGACAA CATCGTACGC ANCATCCCCG 
AAAAGCTCGG GCNTTGTGTC TTTGAATATT 
GCGGGTCAAT ACNATGGTNG ACGGCATCAC 
CTACCGATGC GGGCAGGGCA GGCGGTTCAT 
GACAGCAATT TTATNGCCGG ACTGGATGTC 
CTCGGCAGGC ATCAACAGCC TTGCCGGTTC 
GCGTGGATGA . TGTCGTTCAG GGCAATANTA 
GGTCTGACCG GCACCAATTC AACCAAAGGT 
TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 
ACAGCAGGCG CAGCGTGGCG CAAAATTACC 
CACATCGGAA ATTTTGGCGC GGAATATCTG 
TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 
AGCGGGATTT CCAAAAGTCG TACTGGAAAA 
GATGCCCCCC AAGAACTGCA AAAATACATC 
GCGGGAAAAC CTGGCGCCGC AATACGACAT 
GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 
GTATTCAATA AATACACGGC GCAATTTCGC 
CAGCCGCAAA ATCATCAACC GCAATTATCA 
TGAACCCGTA TACCAACCTC AATCTGACCG 
CAGAAATATC CGAAAGGGTC GAAGTTTACA 
TTTTGAAACC TACAACAACG CAAAAATCCT 
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1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

5 1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGACATT TACCGCTTAA ACTACAGCAC 

14 01 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

14 51 CGGATGACGA ATTTAAGCGG GCATTCGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAACCCGTAT TGAAAAAATA 

10 1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

1701 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

1751 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

15 1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTTCAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2 001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTTCAGC GATGCGAGCG 

20 2 051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACTCAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

2201 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

25 2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

24 01 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

2451 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2 501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

30 2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2601 GAGCTACAAG TTTTAA 

This encodes a protein having (partial) amino acid sequence (SEQ ID NO: 880): 



1 KDKKVFTDAR AVSTRQDIFK SXENLDNIVR XIPGAFTXQX KSSGXVSLNI 

35 51 RXDSGFGRVN TMVDGITXTF YSTSTDAGRA GGSSQFGASV DSNFXAGLDV 

101 VKGSFSGSAG INS LAGS ANL RTLXVDDWQ GNXTYGLLLK GLTGTNSTKG 

151 NAMAAIGARK WLESGASVGV LYGHSRRSVA QNYRVGGGGQ HIGNFGAEYL 

201 ERRKQRYFEQ EGGLKFNSNS GKWERDFQKS YWKTKWYQKY DAPQELQKYI 

251 EGHDKSWREN LAPQYDITPI DPSSLKXQSA GNLFKLEYDG VFNKYTAQFR 

40 301 DLNTKIGSRK I INRNYQFNY GLSLNPYTNL NLTAAYNSGR QKYPKGSKFT 

351 GWGLXKDFET YNNAKILDLX NTSTFRLPRE TELQTTLGFN YFHNEYGKNR 

4 01 FPEELGLFFD GPDXDNGLYS YLGRFKGDKG LLPQKSTIVQ PAGSQYFNTF 

451 YFDAALKKDI YRLNYSTNTV GYRFGGXYTG YYXSDDEFKR AFGENSPTYX 

501 KHCNQSCGIY EPVLKKYGKK RANNHSVSIS ADFGDYFMPF ASYSRTHRMP 

45 551 NIQEMYFSQI GDSGVHTALK PERANTWQFG FNTYKKGLLK QDDILGLKLV 

601 GYRSRIDXYI HNVYGKWWDL NGNIPSWVSS TGLAYTIQHR NFKDKVHICHG 

651 FELELNYDYX RFFTNLSYAY QKSTQPTNFS DASESPNNAS KEDQLKQGYG 

701 LSRVSALPRD YGRLEVGTRW LGNKLTLGGA MRYFGKSIRA TAEERYIDXT 

751 NGXXTSNFRQ LGKRSIXQTE TLARQPLIFD XYAAYEPKKX LIFRAEVKNL 

50 801 FDRRYIDPLD AGNDAATQRY YSSFDPKDKD EEVTCNDDNT LCNGKYGGTS 

851 KSVLTNFARG XTFLITMSYK F* 

ORF133a (SEQ ID NO: 880) and ORF133-1 (SEQ ID NO: 878) show 94.3% identity in 871 aa 
overlap: 



55 



orf 133a .pep 



10 20 30 40 

KDKKVFTDARAVSTRQDI FKSXENLDNI VRXI PGAFTXQXKS 
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Illlllllllll IMIMII MINIM MINI I II 

orfl33-l EAQ I QVLEDVHVKAKRVPKDKKVFTDARAVSTRQD I FKSSENLDN I VRS I PGAFTQQDKS 

10 20 30 40 50 60 

50 60 70 80 90 100 

5 orf 133a. pep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 

II MINI lllllllllllllll Ml MIMIMMIIMM IIIMI lllllll 

orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDVVK 

70 80 90 100 110 120 

110 120 130 140 150 160 

10 orf 133a .pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 13 3 - 1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 

130 140 150 160 170 180 

170 180 190 200 210 220 

15 or f 1 3 3 a . pep ESGAS VGVLYGHSRRS VAQNYRVGGGGQH I GNFGAE YLERRKQRYFEQEGGLKFNSNSGK 

II IIIIIIIMMIIIIMIMIIIU llllllllllllllll llhllllhlll 

orf 133 - 1 ESGAS VGVLYGHSRRS VAQNYRVGGGGQH I GNFGAE YLERRKQRYFVQEGALKFNSDSGK 

190 200 210 220 230 240 

230 240 250 260 270 280 

20 orf 133a. pep WERDFQKS YWKTKWYQKYDAPQELQKY I EGHDKS WRENLAPQYD I TP I DPSS LKXQS AGN 

I hi- || | |::|: Mill I II II MM I MMMMMMM MMI 
orf 133 - 1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 

250 260 270 280 290 

290 300 310 320 330 340 

25 orf 13 3a . pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKI INRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMM 

orf 13 3 - 1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKI INRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

300 310 320 330 340 350 

350 360 370 380 390 400 

30 orf 133a .pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 

IMMMMMI MMMMMMM I h M 1 1 1 M M M 1 1 1 1 M M M M M M 1 1 

orf 133 - 1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 
360 370 380 390 400 410 

410 420 430 440 450 460 

35 orf 133a . pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 

MMMMMI MMMMMMMMMMMMMMMMMMMMMMMM 

orf 133 - 1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 

470 480 490 500 510 520 

40 or f 13 3a . pep LNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI YEPVLKKYGKKRA 

MMMMMMM MMI MMMMMMMM 1 1 1 h II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 

or f 13 3 - 1 LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRS CGI YEPVLKKYGKKRA 

480 490 500 510 520 530 

530 540 550 560 570 580 

45 orf 133a . pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 

1 1 1 1 1 1 H 1 1 1 1 1 M 1 1 1 11 1 1 1 1 1 N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 

orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 
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590 600 610 620 630 640 

orf 133a .pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 

! 1 1 1 1 1 1 1 1 1 1 lllllllllllll lllllll IMMIN III MMIMMMIMI 

orfl33-l T YKKGLLKQDDTLGLKLVGYRS R I DNY I HNVYGKWWDLNGD I PSWVS S TGLAYT I QHRNF 

5 600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a . pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 

1 1 1 1 1 II II II 1 1 II II 1 1 II 1 1 1 M 1 1 II 1 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 133 - 1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
10 660 670 680 690 700 710 

710 720 730 740 750 760 

orf 133a . pep RVS ALPRD YGRLE VGTRWLGNKLTLGGAMRYFGKS I RATAEERY I DXTNGXXTSNFRQLG 

I i 1 1 1 1 1 1 M 1 1 1 1 1 1 I M 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M M III lllllll 

orf 133-1 RVS ALPRD YGRLE VGTRWLGNKLTLGGAMRYFGKS I RATAEERY I DGTNGGNTSNFRQLG 

15 720 730 740 750 760 770 

770 780 790 -800 810 820 

orf 133a. pep KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 

1 1 I I lllllllllllll MUM 1 1 1 1 I I I I i I M I I I 1 1 I M I 1 1 1 1 1 1 1 1 

orf 133 - 1 KRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
20 780 790 .800 810 820 830 

830 840 850 . 860 870 

orf 13 3a . pep SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 

I I I I I I I I I : I I I I I : I I I I I I I I I I I I I I I I I I I I I I I h II II I I I 
or f 13 3 - 1 SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
25 840 850 860 870 880 

Homology with a predicted ORF from N. gonorrhoeae 

ORF133 (SEQ ID NO: 876) shows 92.3% identity over 392 aa overlap with a predicted ORF 
(ORF133ng) (SEQ ID NO: 882) from N. gonorrhoeae: 

orfl33 pep PGYYGSDDEFKRAFGENS PTXKKHCNRS CG I 31 

30 IMIMMIMI MMM I : I I : llh 

orf 133ng FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAFGENSPAYKEHCDPSCGL 560 

orf 133 .pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 

M 1 1 1 1 1 1 M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M 

orf 133ng YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

35 orf 133 .pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNY I HNVYGKWWDLNGD I PSWVS 151 

I Mini II I II II 1 1 II I II II MM MM II 1 1 III MM II II I lllllll llh 

or f 1 3 3 ng KPERANTWQFGFNT YKKGLLKQDD I LGLKLVGYRSR I DNY I HNVYGKWWDLNGD I PSWVG 680 

orf 133 .pep S TGLAYT I QHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFS DAS ESPNNA 211 

I I I I I I 1 I : I I | ||||: I I II I II I II I II II I I I II I II II II I M I I I 

40 orf 133ng S TGLAYT I RHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFS DAS ESPNNA 74 0 



orf 133 .pep 
orf 133ng 



SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS I RATAEERY IDG 

Mill IIIIIIIIIIIIMIIIIIIIIIIIIIIIII IMMIIIIIIIMI MINIMI 

SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKS I RATAEERY IDG 



271 
800 
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orf 133 . pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 331 

Illlllll llllllllllllllllllll II IIIIMIIIIIIIIIIIIIIIIIIIII 
orf 133ng TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPL 860 

orf 133 . pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 391 

lllll|::|IMIII III I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I i I 1 
orf 133ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 

orf 133. pep KF 393 
II 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence (SEQ ID NO: 881) is predicted to encode a 
protein having amino acid sequence (SEQ ID NO: 882): 

1 MRSSFRLKPI CFYLMGVMLY HHSYAEDAGR AGSEAQIQVL EDVHVKAKRV 

51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLLNLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

4 01 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

4 51 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENS PAY 

551 KEHCDPSCGL YEPVLKKYGK KRANNHS VS I SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGN KLTLGG AMRYFGKS IR ATAEERYIDG 

801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 

A variant was also identified, being encoded by the gonococcal DNA sequence (SEQ ID NO: 883): 



1 ATGAGATCTT CTTTCCGGTT GAAGCCGATT TGTTTTTATC TTATGGGTGT 

51 TATGCTATAT CATCATAGTT ATGCCGAAGA TGCAGGGCGC GCGGGCAGCG 

101 AGGCGCAGAT ACAGGTTTTG GAAGATGTGC ACGTCAAGGC GAAGCGCGTA 

151 CCGAAAGACA AAAAAGTGTT TACCGATGCG CGTGCCGTAT CGACCCGTca 

201 gGATGTGTTC AAATCCGGCG AAAACCTGGA CAACATCGTA CGCAGCATAC 

251 CCGGTGCGTT TACACAGCAA GATAAAAGCT CGGGCATTGT GTCTTTGAAT 

3 01 ATTCGCGGCG ACAGCGGGTT CGGGCGGGTC AATACGATGG TGGACGGCAT 
351 CACGCAGACC TTTTATTCGA CTTCTACCGA TGCGGGCAGG GCAGGCGGTT 

4 01 CATCTCAATT CGGTGCATCT GTCGACAGCA ATTTTATTGC CGGACTGGAT 
451 GTCGTCAAAG GCAGCTTCAG CGGCTCGGCA GGCATCAACA GCCTTGCCGG 
501 TTCGGCGAAT CTGCGGACTT TAGGCGTGGA TGACGTCGTT CAGGGCAATA 
551 ATACCTACGG CCTGCTGCTA AAAGGTCTGA CCGGCACCAA TTCAACCAAA 
601 GGTAATGCGA TGGCGGCGAT AGGTGCGCGC AAATGGCTGG AAAGCGGAGC 
651 GTCTGTCGGT GTGCTTTACG GGCACAGCAG GCGCGGCGTG GCGCAAAATT 
701 ACCGCGTGGG CGGCGGCGGG ■ CAGCACATCG GAAATTTTGG TGAAGAATAT 
751 CTGGAACGGC GCAAACAGCA ATATTTTGTA CAAGAGGGTG GTTTGAAATT 
801 CAATGCCGGC AGCGGAAAAT GGGAACGGGA TTTGCAAAGG CAATACTGGA 
851 AAACAAAGTG GTATAAAAAA TACGAAGACC CCCAAGAACT GCAAAAATAC 
901 ATCGAAGAGC ATGATAAAAG CTGGCGGGAA AACCTGGCGC CGCAATACGA 
951 CATCACCCCC ATCGATCCGT CCGGCCTGAA GCAGCAGTCG GCAGGCAATC 
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1001 TGTTTAAATT GGAATACGAC GGCGTATTCA ATAAATACAC GGCGCAATTT 

1051 CGCGATTTAA ACACCAGAAT CGGCAGCCGC AAAATCATCA ACCGCAATTA 

1101 TCAATTCAAT TACGGTTTGT CTTTGAACCC GTATACCAAC CTCAATCTGA 

1151 CCGCAGCCTA CAATTCGGGC AGGCAGAAAT ATCCGAAAGG GGCGAAGTTT 

5 1201 ACAGGCTGGG GGCTTTTAAA AGATTTTGAA ACCTACAACA ACGCGAAAAT 

1251 CCTCGACCTC AACAACACCG CCACCTTCCG GCTGCCCCGC GAAACCGAGT 

1301 TGCAAACCAC TTTGGGCTTC AATTATTTCC ACAACGAATA CGGCAAAAAC 

1351 CGCTTTCCTG AAGAATTGGG GCTGTTTTTC GACGGTCCTG ATCAGGACAA 

1401 CGGGCTTTAT TCCTATTTGG GGCGGTTTAA GGGCGATAAA GGGCTGTTGC 

10 14 51 CTGAAAAATC AACCATTGTC CAACCGGCCG GCAGCCAATA TTTCAACACG 

1501 TTCTACTTCG ATGCCGCGCT CAAAAAAGAC ATTTACCGCT . TAAACTACAG 

1551 CACCAATGCA ATCAACTACC GTTTCGGCGG CGAATATACG GGCTATTACG 

1601 GCTCGGAAAA CGAATTTAAG CGGGCATTCG GAGAAAACTC GCCGGCATAC 

1651 AAGGAACATT GCGACCCGAG CTGCGGGCTT TATGAACCCG TATTGAAAAA 

15 1701 ATACGGCAAA AAGCGCGCCA ACAACCATTC GGTCAGCATT AGTGCGGACT 

1751 TCGGCGATTA TTTCATGCCG TTCGCCGGCT ATTCGCGCAC ACACCGTATG 

1801 CCCAACATCC AAGAAATGTA TTTTTCCCAA ATCGGCGACT CCGGCGTTCA 

1851 CACCGCCTTA AAACCAGAGC GCGCAAACAC TTGGCAATTT GGCTTCAATA 

1901 CCTATAAAAA AGGATTGTTA AAACAAGATG ATATATTAGG ATTGAAACTG 

20 1951 GTCGGCTACC GCAGCCGCAT TGACAACTAC ATCCACAACG TTTACGGGAA 

2001 ATGGTGGGAT TTGAACGGGG ATATTCCGAG CTGGGTCGGC AGCACCGGGC 

2051 TTGCCTACAC CATCCGACAC CGCAATTTCA AAGACAAAGT GCACAAACAC 

2101 GGTTTTGAGC TGGAGCTGAA TTACGATTAT GGGCGTTTTT TCACCAACCT 

2151 TTCTTACGCC TATCAAAAAA GCACGCAACC GACCAATTTC AGCGATGCGA 

25 2201 GCGAATCGCC CAACAATGCC tccaaAGAAG ACCAACTCAA ACAAGGTTAT 

2251 GGGCTGAGCA GGGTTTCCGC CCTGCCGCGA GATTACGGAC GTTTGGAAGT 

2301 CGGTACGCGC TGGTTGGGCA ACAAACTGAC TTTGGGCGGC GCGAtgcGCT 

2351 ATTTCGGCAA GAGCATCCGC GCGACGGCTG AAGAACGCTA TATCGACGGC 

24 01 ACCAACGGGG GAAATACCAG CAATGTCCGG CAACTGGGCA AGCGTTCCAT 

30 2451 CAAACAAACC GAAACCCTTG CCCGACAGCC TTTGATTTTT GATTTTTACG 

2501 CCGCTTACGA GCCGAAGAAA AACCTTATTT TCCGCGCCGA AGTCAAAAAC 

2551 CTGTTCGACA GGCGTTATAT CGATCCGCTC GATGCGGGCA ATGATGCGGC 

2601 AACGCAGCGT TATTACAGCT CGTTCGACCC GAAAGACAAG GACGAAGACG 

2651 TAACGTGTAA TGCTGATAAA ACGTTGTGCA ACGGCAAATA CGGCGGCACA 

35 2701 AGCAAAAGCG TATTGACCAA TTTCGCACGC GGACGCACCT TCTTGATGAC 

2751 GATGAGCTAC AAGTTTTAA 

This corresponds to the amino acid sequence (SEQ ED NO: 884; ORF133ng-l): 

1 MRSSFRLKPI CFYLMGVMLY HHSYA EDAGR AGSEAQIQVL EDVHVKAKRV 

40 51 PKDKKVFTDA RAVSTRQDVF KSGENLDNIV RSIPGAFTQQ DKSSGIVSLN 

101 IRGDSGFGRV NTMVDGITQT FYSTSTDAGR AGGSSQFGAS VDSNFIAGLD 

151 WKGSFSGSA GINSLAGSAN LRTLGVDDW QGNNTYGLLL KGLTGTNSTK 

201 GNAMAAIGAR KWLESGASVG VLYGHSRRGV AQNYRVGGGG QHIGNFGEEY 

251 LERRKQQYFV QEGGLKFNAG SGKWERDLQR QYWKTKWYKK YEDPQELQKY 

45 301 IEEHDKSWRE NLAPQYDITP IDPSGLKQQS AGNLFKLEYD GVFNKYTAQF 

351 RDLNTRIGSR KIINRNYQFN YGLSLNPYTN LNLTAAYNSG RQKYPKGAKF 

401 TGWGLLKDFE TYNNAKILDL NNTATFRLPR ETELQTTLGF NYFHNEYGKN 

451 RFPEELGLFF DGPDQDNGLY SYLGRFKGDK GLLPQKSTIV QPAGSQYFNT 

501 FYFDAALKKD IYRLNYSTNA INYRFGGEYT GYYGSENEFK RAFGENSPAY 

50 551 KEHCDPSCGL YEPVLKKYGK KRANNHSVSI SADFGDYFMP FAGYSRTHRM 

601 PNIQEMYFSQ IGDSGVHTAL KPERANTWQF GFNTYKKGLL KQDDILGLKL 

651 VGYRSRIDNY IHNVYGKWWD LNGDIPSWVG STGLAYTIRH RNFKDKVHKH 

701 GFELELNYDY GRFFTNLSYA YQKSTQPTNF SDASESPNNA SKEDQLKQGY 

751 GLSRVSALPR DYGRLEVGTR WLGNKLTLGG AMRYFGKSIR ATAEERYIDG 

55 801 TNGGNTSNVR QLGKRSIKQT ETLARQPLIF DFYAAYEPKK NLIFRAEVKN 

851 LFDRRYIDPL DAGNDAATQR YYSSFDPKDK DEDVTCNADK TLCNGKYGGT 

901 SKSVLTNFAR GRTFLMTMSY KF* 
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ORF133ng-l (SEQ ID NO: 884) and ORF133-1 (SEQ ID NO: 878) show 96.2% identity in 889 aa 
overlap: 



10 20 30 40 50 60 

or f 1 3 3 ng - 1 . pep S FRLKP I CFYLMGVML YHHS YAEDAGRAGSEAQI QVLEDVHVKAKRVPKDKKVFTDARAV 
5 I II I Ml I I I II I II I I I I I M I M I I 

orf 13 3 - i EAQ I QVLEDVHVKAKRVPKDKKVFTDARAV 

10 20 30 



70 80 90 100 110 120 

orf 133ng- 1 . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
M I I Ml II M Ml I I I I I I I I I I I II II I I I II M I I I I I I M M I I I I I II II II M 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 

40 50 60 70 80 90 



130 140 150 160 170 180 

orf 133ng- 1 . pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 

100 110 120 130 140 150 



190 200 210 220 230 240 

orf 133ng- 1 . pep NTYGLLLKGLTGTNSTKGNAMAAIGARKWLESGASVGVLYGHSRRGVAQNYRVGGGGQHI 

I I I I I I I I I I I I II II I II I I I I I I I I I I I M I I I II I I I I I I - I I M I I I I I I I I I 
or f 13 3 - 1 NTYGLLLKGLTGTNSTKGNA^4AAIGARKWLESGASVGVLYGHSRRSVAQNYRVGGGGQHI 

160 170 180 190 200 210 



250 260 270 280 290 300 

orf 13 3ng-l.pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 

25 I 1 1 I 1 1 1 1 1 1 I h I 1 1 1 1 I : I I I h I I I I I I I I I I I II I I I • I : : I I I I I I I I I 

orf 133-1 GNFGAEYLERRKQRYFVQEGALKFNSDSGKWERDLQRQQWKYKPYKNYNN-QELQKYIEE 

220 230 240 250 260 



310 320 330 340 350 360 

o r f 1 3 3 ng - 1 . pep HDKS WRENLAPQ YD I TP I DPSGLKQQS AGNLF KLE YDGVFNKYTAQ FRDLNTR I GSRKI I 
30 | | | | | || | | | | || | | | | | | | : | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | : | | | | | | I 

orf 133-1 HDKSWRENLXPQYDITP IDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKI I 

270 280 290 300 310 320 



370 380 390 400 410 420 

orf 133ng- 1 . pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 

I I I I I I I II II I I I I I I I M I I I II II I M I I M I II I I II M II I II Ml M I II I I 
orf 133 - 1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
330 340 350 360 370 380 



430 440 450 460 470 480 

orf 133ng-l .pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 

1 1 1 1 II 1 1 1 1 1 1 II II 1 1 1 I M 1 1 1 II II 1 1 1 II 1 1 1 1 1 1 1 M 1 1 II I MM M I II 

orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
390 400 410 420 430 440 

490 500 510 520 530 540 

orf 133ng- 1 . pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 

1 1 1 1 1 1 1 II M I M 1 1 1 1 M 1 1 1 1 II 1 1 1 1 1 1 1 M-l I M 1 1 II 1 1 1 1 MM 1 1 1 II 

orf 133 - 1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 
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550 560 570 580 590 600 

orf 133ng-l .pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 

llllhlhlh M - - 1 1 li 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M I 1 1 1 1 1 M M 1 1 1 1 II 1 1 

orf 133-1 GENSPTYKKHCNRSCGIYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNI 
5 510 520 530 540 550 560 

610 620 630 640 650 660 

orf 133ng- 1 . pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I I I I I I I I I I I ! I I ' I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
10 570 580 590 600 610 620 

670 680 690 700 710 720 

orf 133ng- 1 . pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 

I I I I I I I I I I I I I I I : I I I I I I I hi I I I I I I I I I I I I I ' I I I I I I I I I I M I I I I I I 
orf 133 - 1 VYGKWWDLNGDIPSWVSSTGLAYTIQHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 
15 630 640 650 660 670 680 

730 740 750 760 770 780 

orf 133ng-l .pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

I | | | I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I 
orf 133 - 1 STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 

20 690 700 710 720 730 740 

790 800 810 820 830 840 

orf 133ng-l .pep YFGKS IRATAEERY IDGTNGGNTSNVRQLGKRS I KQTETLARQPLI FDFYAAYEPKKNL I 

IMIIIIIIIII IIIIIIIMI 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 : 

orf 133-1 YFGKS I RATAEERYIDGTNGGNTSNFRQLGKRS I KQTETLARQPLI FDFYAAYEPKKNL I 

25 750 760 770 780 790 800 

850 860 870 880 890 900 

orf 133ng- 1 . pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I M , I ■ I I I I I 1 I I I I I I I M 1 1 h I ! 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
30 810 820 830 840 850 860 

910 920 
orf 133ng-l .pep VLTNFARGRTFLMTMSYKFX 

I I I I I I II I I I I I I I I I I 
or f 13 3 - 1 VLTNFARGRTFLMTMSYKFX 
35 870 880 

In addition, ORF133ng-l (SEQ ID NO: 884) is homologous to a TonB-dependent receptor (SEQ 
ID NO: 1 167) in HJnfluenzae: 

sp|P45114 |YC17_HAEIN PROBABLE TONB- DEPENDENT RECEPTOR HI1217 PRECURSOR 
40 )gi|l075372|pir| (G64110 transferrin binding protein 1 precursor (tbpl) homolog - 

Haemophilus influenzae (strain Rd KW20) )gi| 1574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 
Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 

45 Query: 38 QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNI VRS I PGAFTQQDKSSGIV 97 

+ L + V K + DKK FT+A+A STR++VFK + +D + +RS I PGAFTQQDK SG+V 
Sbjct: 29 ETLGQ IDWEKVI SNDKKP FTE AKAKSTRENVFKETQT I DQV I RS I PGAFTQQDKGSGW 88 



Query: 



98 



SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 
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Sbjct: 89 



S+NIRG++G GRVNTMVDG+TQTFYST+ D+G+ +GGSSQFGA+ +D NFIAG+DV K +FS 
SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 14 8 



10 



15 



20 



25 



30 



35 



40 



45 



Query: 158 GSAGINSLAGSANLRTLGVDDWQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLESGA 217 

G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 

Sbjct: 149 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

Query: 218 SVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 

VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 
Sbjct: 209 YVGWYGYSQREVSQDYR I - GGGERLASLGQD I LAKEKEAYF - RNAGY I LNP - EGQWTPD 265 

Query: 278 LQRQYWK TKWY KKYEDPQELQK YIEE 303 

L +++W +Y KK +D ++LQK IEE 

Sbjct: 266 LS KKHWSCNKPDYQKNGDCS Y YRI GS AAKTRRE I LQELLTNGKKPKD I EKLQKGNDG I EE 325 

Query: 304 HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 363 

DKS+ N QY + PI+P L+ +S +L K EY AQ R L+ +IGSRKI 

Sbjct: 326 TDKSFERN-KDQYSVAPIEPGSLQSRSRSHLLKFEYGDDHQNLGAQLRTLDNKIGSRKIE 384 

Query: 364 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 423 

NRNYQ NY + N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 

Sbjct: 385 NRNYQVN YNFNNNS YLDLNLMAAHN I GKT I Y P KGGF FAGWQ VADKL I TKNVAN I VD I NNS 444 

Query: 424 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYS Y - - LGRFKGDKG 481 

TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ D GLYS + GR+ G K 
Sbjct: 445 HTFLLPKEIDLKTTLGFNYFTNEYSKNRFPEELSLFYNDASHDQGLYSHSKRGRYSGTKS 504 

Query: 4 82 LLPQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKR 541 

LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 
Sbjct: 505 LLPQRSVILQPSGKQKFKTVYFDTALSKGIYHLNYSVNFTHYAFNGEYVGY 555 

Query: 542 AFGENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMP 601 

EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

Sbjct: 556 ENTAGQQ INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMP 604 

Query: 602 NIQEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYI 661 

NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NY I 
Sbjct: 605 NIQEMFFSQVSNAGVNTALKPEQSDTYQLGFNTYKKGLFTQDDVLGVKLVGYRSFIKNYI 664 

Query: 662 HNVYGK^DLNGDIPSWVGSTGI^YTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAY 721 

HNVYG WW +P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 

Sbjct: 665 HNVYGVWW- - RDGMPTWAESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAY 722 

Query: 722 QKSTQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGA 781 

Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A 
Sbjct: 723 QRTNQPTNYADASPRPNNASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLA 782 

Query: 782 MRYFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKN 841 

RY+GKS RAT EE YI+G+ + +R+ ++K+TE + +QP+I D + +YEP K+ 

Sbjct: 783 ARYYGKSKRATIEEEYINGSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKD 841 

Query: 842 LIFRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTS 901 

LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS + + C D + C GG+ 
Sbjct: 842 LI IKAEVQNLLDKRYVDPLDAGNDAASQRYYSSL NNSIECAQDSSAC GGSD 892 

Query: 902 KSVLTNFARGRTFLMTMSYKF 922 

K+VL NFARGRT++++++YKF 
Sbjct: 893 KTVL YNFARGRTY I LS LNYKF 913 



CHIR-01 60 (356.001 ) PATENT 

-612- 

The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 104 

The following partial DNA sequence was identified in N. meningitidis (SEQ ID NO: 885) 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

3 01 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 
351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 
401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT . . 

This corresponds to the amino acid sequence (SEQ ID NO: 886; ORF1 12): 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH . . . 

Further work revealed further partal nucleotide sequence (SEQ ID NO: 887): 



1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

4 51 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 

951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG . , . 



This corresponds to the amino acid sequence (SEQ ID NO: 888; ORF1 12-1): 
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1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LIPLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVS I AA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHLA GRLF GFTSQL . . . 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF112 (SEQ ID NO: 886) shows 96.4% identity over a 166aa overlap with an ORF (ORF112a) 
(SEQ ID NO: 890) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 112 .pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 ! M 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 MMMI M 

orf 112a MNL I SRY 1 1 RQMAVMAVYALLAFLALYS FFE I LYETGNLGKGSYGI WEMXGYTALKMXAR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

MMMMMMMM I M 1 1 1 M : 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 : 1 , 1 1 1 II I M I 

orf 112a AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 

130 140 150 160 

orf 112 .pep VAPTLSQKAENI KAAAINGKI STGNTGLWLKEKNSVINVREMLPDH 

M I I I I I I I I I I I I I I I I I I I I I I I . I I I I I I M I I I M I I I I 
orf 112a . VAPTLSQKAEN I KAAAINGKI STGNTGLWLKEKNS I INVREMLPDHTLLG IKIWARNDKN 

130 140 150 160 170 180 

orf 112a ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVS I AAEEXWP I SVKRNLMDVLLVKP 

190 200 210 . 220 230 240 

The ORF1 12a nucleotide sequence (SEQ ID NO: 889) is: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGNTG 

151 GGNTACACCG CCCTCAAAAT GNCCGCCCGC GCCTACGAAC TGATGCCCCT 

201 CGCCGTCCTT ATCGGCGGAC TGGTCTCTNT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAN CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCGGCCAT CAACGGCAAA ATCAGTACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCATTAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 CCTGCTGGGC ATTAAAATCT GGGCCCGCAA CGATAAAAAC GAACTGGCAG 

551 AGGCAGTGGA AGCCGATTCC .GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAANT GGCCGATTTC CGTCAAACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAACGC GACCAAATGT CCGTCGGCGA ACTGACCACC 
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751 TACATCCGCC ACCTCCAAAN NNACAGCCAA AACACCCGAA TCTACGCCAT 

801 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 

851 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 

901 TTAAAANTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

5 951 NCGGCTCTTC NGGTTTACCA GCCAACTCTA CGGCATCCCG CCCTTCCTCG 

1001 NCGGCGCACT ACCTACCATA GCCTTCGCCT TGCTCGCCGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGCTA A 

This encodes a protein having the amino acid sequence (SEQ ID NO: 890): 

10 1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEMX 

51 GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK ASGMSTKKLL 
101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 
151 KEKNSIINVR EMLPDHTLLG I KI WARNDKN ELAEAVEADS AVLNSDGSWQ 
201 LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP DQMSVGELTT 

15 2 51 YIRHLQXXSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKXFGGICLG LLFHLA GRLF XFTSQLYGIP PFLXGALPTI AFALLAVWLI 
351 RKQEKR* 

ORF112a (SEQ ID NO: 890) and ORF112-1 (SEQ ID NO: 888) show 96.3% identity in 326 aa 
20 overlap: 

orfll2a.pep MNL I S R Y 1 1 RQMAVMAVYALLAFLAL YS FFE I L YETGNLGKGS YG I WEMXG YTALKMXAR 

1 1 1 1 1 ' 1 1 II II , 1 1 1 1 1 II 1 1 M 1 1 1 1 II 1 1 II II 1 1 1 1 1 1 1 1 1 lllllll II ' 

orf 112-1 MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 
orf 112a. pep AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

25 Illhlllllllllll M 1 1 Ml I hi II Illlllllllllllll Ml II II I II III 

orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
orf 112a . pep VAPTLSQKAEN I KAAAINGKI STGNTGLWLKEKNS I INVREMLPDHTLLG I KI WARNDKN 

M M II I I ! 1 1 1 1 1 1 1 1 1 II II I 1 1 1 1 1 1 1 ! I III IIIMI llllll Mill 

orf 112-1 VAPTLSQKAEN I KAAAINGKI STGNTGLWLKEKNSX INVREMLPDHTLLG I KI WARNDKN 

30 orf 112a . pep ELAEAVEADS AVLNSDGSWQLKNIRRSTLGEDKVEVS I AAEEXWP I SVKRNLMDVLLVKP 

1 1 1 M I M 1 1 II 1 1 1 1 1 M 1 1 1 M 1 1 M 1 1 1 1 M M II I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

or f 1 12 - 1 ELAEAVEADS AVLNSDGSWQLKNIRRSTLGEDKVEVS lAAEENWPI SVKRNLMDVLLVKP 

orf 112a . pep DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

Illlllllllllllll IMMIII MMMMMMIMIMM MIMMMIM 

35 orf 112 - 1 DQMSVGELTTYIRHLQNNSQNTRI YAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

orf 112a . pep LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II lllll lllllllllll lllll 
orfll2-l LKLFGG I CXGLLFHLAGRLFGFTSQL 

Homology with a predicted ORF from N. gonorrhoeae 

40 ORF112 (SEQ ID NO: 886) shows 95.8% identity over 166aa overlap with a predicted ORF 
(ORF1 12ng) (SEQ ID NO: 892) from N. gonorrhoeae: 



orf 112 .pep MNL I SRY 1 1 RQMAVMAVYALLAFLALYS FFE I LYETGNLGKGS YG I WEMLGYTALKMPAR 60 

1 1 1 1 1 1 1 1 1 1 1 1 M M M 1 1 1 M 1 1 1 II II 1 1 1 II I II 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M M 1 1 
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orf 112ng MNLISRYIIRQMAV^4AVYALLAFLlALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 120 

MMMIMMIMMIMM MMMIIMIMMIMM I IMMMMIMM 

orfll2ng AYELMPI^VLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

5 orf 112 .pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 

III II II 1 1 1 1 1 II Ml II 1 1 1 MM 1 1 1 M Ml M II I Mill 

or f 1 1 2 ng VAPTLSQKAEN I KAAAINGKI STGNTGLWLKEKTS I INVRGMLPDHTLLGI KI WARNDKN 180 

The complete length ORF1 12ng nucleotide sequence (SEQ ID NO: 891) is: 

10 1 ATGAACCTGA TTTCACGTTA CATCATCCGC CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TCATGCCCCT 

2 01 CGCCGTCCTC ATCGGCGGAC TGGCCTCTCT CAGCCAGCTT GCCGCCGGCA 
15 2 51 GCGAACTGGC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

3 01 TTGATTCTGT CTCAGTTCGG TTTTATTTTT GCTATTGCCG CCGTCGCGCT 
351 CGGCGAATGG GTTGCGCCCA CGCTGAGCCA AAAAGCCGAA AACATCAAag 

4 01 cCGCCGCCAt taacggCAAA ATCAGCAccg gcAATACCGG CCTTTggcTG 
4 51 AAAGAAAAAa ccAGCATTAT. CAATGTGcGc GGAATGTTGC CCGACCATAC 

20 501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGCTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CATCATGGGT ACAGACAAAA TCGAAACATC 

651 cgCCGCCGCC GAAGAAACTT gGCCGATTGC CGTCAGACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAGCCC GACCAAATGT CCGTCGGCGA GCTGACCACC ■ 

25 751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCAAA TCTACGCCAT 

8 01 CGCATGGTGG CGTAAACTCG TTTACCCCGT CGCCGCATGG GTCATGGCGC 

851 TCGTTGCCTT CGCCTTTACG CCGCAAACCA CGCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 CAGGCTCTTC GGGTTTACCA GCCAACTCTA CGGCACCCCA CCCTTCCTCG 

30 1001 CCGGCGCACT GCCTACCATA GCCTTCGCCT TGCTCGCTGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGTTG A 

This encodes a protein having amino acid sequence (SEQ ID NO: 892): 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

35 51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG I KI WARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW R KLVYPVAAW VMALVAFAF T PQTTRHGNMG 

40 301 LKLFGGICLG LLFHLAGRLF GFTSQLYGTP PFLAGALPTI AFALLAVWLI 

351 RKQEKR* 

ORF1 12ng (SEQ ID NO: 892) and ORF1 12-1 (SEQ ID NO: 888) show 94.2% identity in 326 aa 
overlap: 



45 10 20 30 40 50 60 

orf 112ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

1 1 1 1 Ml I 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M I h I M 1 1 1 1 1 1 1 1 M M II 1 1 1 1 1 1 1 

orf 112-1 MNLISRYI IRQMAVMAVYALLAFLALYS FFE I L YETGNLGKGS YG I WEMLGYTALKMPAR 

10 20 30 40 50 60 



50 
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70 80 90 100 110 120 

orf 112ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 

M ||:| I I I I I I I hli I I I i I I I M : i I I I I I I M I I I I i I I I I I I I I M h M II 
orf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 112ng VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 

III MINIMUM IIIMIIIirillM 1 1 1 1 1 : 1 1 1 1 1 i 1 1 1 1 1 1 1 II I 

orf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 
10 130 140 150 160 170 180 



190 200 210 220 230 240 

orf 112ng ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 

IIIIIIIIIIIIIIIIIIIIIIIIIM M Ihhl MM IMIMMMMIM 

orf 112 - 1 ELAEAVEMDSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 
15 190 200 210 220 230 240 

250 260 270 280 290 300 

orf 1 12ng DQMSVGELTTYIRHLQNNSQNTQI YAI AWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 

I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 

orf 112-1 DQMSVGELTTYIRHLQNNSQNTRIYAIAVWRKLWPAAAWVMALVAFAFTPQTTRHGNMG ■ 

20 250 260 270 280 290 300 

310 320 330 340 350 

orf 112ng LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 

Illlllll Mill MINIMI 

orf 112-1 LKLFGGI CXGLLFHLAGRLFGFTSQL 

25 310 320 

This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 105 

30 Table III lists several Neisseria strains which were used to assess the conservation of the sequence 
of ORF 4 (SEQ ID NO: 216) among different strains. 



TABLE III - List of Neisseria Strains Used for Gene Variability Study of ORF 4 (SEQ ID 
NO: 216) 



OMF4 gene variability: List of used Neisseria strains 


IdentificationStrains 




Source / reference 


number - 








Group B 




zv01_4 


NG6/88 


R. Moxon / Seiler et al., 1996 


zv02_4 


BZ198 


R. Moxon / Seiler et al., 1 996 


zv03_4ass 


NG3/88 


R. Moxon / Seiler et al., 1996 


zv04_4 


297-0 


R. Moxon / Seiler et al., 1996 



OnlK-UiOU ^jJO.UUl ) 


-617- 
U 1 / 


PATFNT 

1 A J. I_*± t( A 


ZVUj_4 


1 nnn 
J uuu 


P TVAmron / ^Ipilpr pt al 1006 

iv. iviuxuii / ocnci ci tu., \yy\j 


ZVU0_4 


DZsiH 1 


P \yfr*Yrm / ^pilpr pf al 1006 
IV. IVIUAUII / OCUCI CI al., 177U 


ZVU /_4 


DAI Oy 


P \/TriYrm / ^pilpr pt al 1006 
IV. IVIUAUII / oCJICI CI al., 177U 


ZVUo_4 


jZo 


P A/TnYnn / ^pilpr pt al 1006 
iv. iviUAUii / ocnci ci ai., lyyv 


zvuy_4 


IN Or 1 Oj 


P IV/friY r^n / Qpilpr of rt / 1 006 
iv. IVIUAUII / DC11CI et ut., I77U 


ZVlU_4 


ijZj 1 jj 


P A/Tr*Yrm / ^Ipilpr at nl 1006 
Iv. IVIOaUII / OCUCI et ut., 1 ^7!/U 


ZV 1 1_4 


INVJEO i 


P lMriYrm / ^pilpr pt al 1006 

iv. iviuauii / ocnci ci ai., \yy\j 


zv 1 2_4ass 


IN OF ZD 


P Mnynn / ^pilpr pt al 1006 
IV. IVIUAUII / oCliCI CI al., 177U 


,.,,10 /I 

ZV 1 J_4 


INOUZo 


P Mnynn / Q^ilpr pt al 1006 
iv. IVIUAUII / oCUCI CI al., 1!77U 


ZV J D_4 


CAl/71 H7 

oWZilU / 


P MnYnn / ^Ipilpr pt al 1006 
IV. IVIUAUII / oCUCI CL al., 177U 


zv 1 6_4 


MYTH" 1 ^ 
IN Oil J J 


P A/friYnn / ^Ipilpr pt al 1006 
IV. IVIUAUII / OC11C1 CL ai., iyy\j 


ZV 1 /__4 


INunjO 


P A/frvYrm / ^pilpr pt al 1006 
iv. IVIUAUII / oCUCI CL al., I77U 


zvl8_4 


BZ232 


R. Moxon / Seiler et al., 1996 


zvl9_4 


BZ83 


R. Moxon / Seiler et al., 1996 


ZVZU_ i f 


*+H7 / U 


R Moxon / Seiler et al 1996 


zvzl_4 


Mojo 


k. ivioxon 


zv96_4 


2996 


Our collection 




vfI UU|J ri. 




ZViz_4 


ZUjVUU 


P \/f /-\v r\r\ 
Iv. IVIUAOII 


z2491_4 


Z2491 


R. Moxon / Maiden et al., 1998 




oroup o 




ZVZ4_4 


on/1 q^i 1 


XV. IVIUAUII 


ZVZJ_4 


Cn/49 R6 


Iv. IVIUAUII 




Others 




zv26_4ass 


A22 (group W) 


R. Moxon / Maiden et al., 1998 


ZVZ /_4 


P96 / err™ in Y'l 

.czo ^group A J 


P \AoYrvn / X/TaiHpn Pt al 1 00R 

IV. IVIUAUII / IVlalUClI CL ai., 1770 


ZVZO_4 


860800 (group Y) 


R. Moxon / Maiden et al., 1998 


zvzy_4 


E32 (group Z) 


R. Moxon / Maiden et al., 1998 




Gonococcus 




zv32_4Ng F62 


R. Moxon / Maiden et al, 1998 


zv33_4 


Ng SN4 


R. Moxon 


fal090_4 


FA 1090 


R. Moxon 


References: 






Seiler A. etai, Mol. Microbiol., 1996, 19(4):841-856. 


Maiden et al., Proc. Natl. Acad. Sci. USA, 1998, 95:3140-3145. 



The amino acid sequences for each listed strain are as follows: 
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>FA1090_4 (SEQ ID NO: 893) 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQ I Q AE L E KKG YT VKL VE FTDYVR PNLALAEGE LD I NVFQHKP YLDD F KKE HNLD I T E AF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
5 KADI AENLKNI K I VEL E AAQL PR S RAD VD FAWNGNY A I S SGMKLTEALFQEPS FAYVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>Z2491_4 (SEQ ID NO: 894) 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
1 0 VPTAPLGLYPGKIjKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
AD I AENLKN I KI VELEAAQL PRSRADVDFA WNGNYAI S SGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV01_4 (SEQ ID NO: 895) 

MKTFFKTLSAAALAL I LAACGGQKDSAPAASASAAADNGAAKKE I vfgttvgdfgdmvke 

1 5 qiqaelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 
vptaplglypgklksleevkdgstvsapndpsnfarvlvmldelgwiklkdginpltask 
adiaenlkni ki veleaaqlprsradvd fa wngnyai ssgmkltealfqeps fayvnws 
avktadkdsqwlkdvteaynsdafkayahkrfegykspaawnegaak* 

>zv02_4 (seq id no: 896) 
20 mktffktlsaaalal i laacggqkds apaas as aaadngaekke i vfgttvgdfgdmvke 
hiqpelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 
v pt a plgl y pgkl ks le e vkdg s t vs a pnd p s n f arvl vmldelgw i kl kdg i n plt as k 
adiaenlkni ki veleaaqlprsradvd fawngnyai s sgmkltealfqeps fayvnws 
avktadkdsqwlkdvteaynsdafkayahkrfegykspaarnegaak* 

25 >zv03_4ass (seq id no: 897) 

mkt ffktlsaaalal i laacggqkds apaas as aaadngaekke i vfgttvgdfgdmvke 

hiqpelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 

vptaplglypgklksleevkdgstvsapndpsnfarvlvmldelgwiklkdginpltask 

adiaenlkni ki veleaaqlprsradvdfawngnyai ssgmkltealfqeps fayvnws 
30 avkt adkd s qwl kdvt e a yn sdaf kayah krf eg yks paawnegaak* 

>zv04_4 (seq id no: 898) 

mkt ffktlsaaalal i laacggqkds apaas as aaadngaekke i vfgttvgdfgdmvke 
h i q p e le kkg yt vkl ve ftdyvr pnlal aegeld i nvfqh kpylddf kke hnld i t e vfq 
vptaplglypgklksleevkdgstvsapndpsnfarvlvmldelgwiklkdginpltask 
35 adiaenlkni ki veleaaqlprsradvdfawngnyai ssgmkltealfqeps fayvnws v 
avktadkdsqwlkdvte aynsdafkay ah krfeg yks paawnegaak * 

>zv05_4 (seq id no: 899) 

mktffktlsaaalal i laacggqkds apaas as aaadngaekke i vfgttvgdfgdmvke 
hiqpelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 
40 vptaplglypgklksleevkdgstvsapndpsnfarvlvmldelgwiklkdginpltask 
adiaenlkni ki veleaaqlprsradvdfawngnyai ssgmkltealfqeps fayvnws 
avktadkdsqwlkdvteaynsdafkayahkrfegykspaawnegaak* 

>zv06_4 (seq id no: 900) 

mkt f f ktl s aaal al i laacggqkds a p aas as aaadngae kke i vfgtt vgd fgdmvke 
45 q i q ae le kkg yt vkl ve f td yvr pnlal aegeld i nvfqh k p ylddf kke hnld i t e vfq 
vptaplglypgklksleevkdgstvsapnd psnf arvl vmldelgw i klkdginpltask 
ad i aenlkni k i veleaaqlprsradvdfawngnyai ssgmkltealfqeps fayvnws 
avkt ahkd s qwl kd vte a ynsdaf kay ah krf eg yks paawnegaak * 

>zv07_4 (seq id no: 901) 
50 mkt ffktlsaaalal i laacggqkdsapaasasaaadngaakke i vfgttvgdfgdmvke 
qiqaelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 
vptaplglypgklksleevkdgstvsapnd psnf arvl vmldelgw i klkdginpltask 
adiaenlknikiveleaaqlprsradvdfawngnyaissgmkltealfqepsfayvnws 
avktadkdsqwlkdvteaynsdafkayahkrfegyks paawnegaak* 

55 >zv08_4 (seq id no: 1107) 

mkt f fktlsaaalal i laacggqkds apaasas aaadngae kke i vfgttvgdfgdmvke 
hiqpelekkgytvelveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevfq 
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VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGW I KLKDG INPLTASK 
AD I AENL KN I K I VE L E AAQL PRS RAD VD F AWNGNYA I S SGMKLTEAL FQE P S F A YVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV09_4 (SEQ ID NO: 902) 

MKT F F KTL S AAALAL I LAACGGQKDS A P AAS AS AAADNGAE KKE I VFGTTVGD FGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
AD I AENL KN I K I VEL EAAQL PRS RAD VD F AWNGNYA I S SGMKLTEAL FQE P S FAYVNWS 
AVKTADKDS Q WL KDVT E A YNS D A FKA Y AHKR F EG YKS PAAWNEGAAK* 

>ZV10_4 (SEQ ID NO: 903) 

MKTFFKTLSAAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I VFGTTVGD FGDMVKE 
H I Q P EL E KKG YT VKL VE FTD YVR PNLALAEGE LD I NV FQH K P YLDD F KKE HNLD I TE VFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
AD I AENLKN I K I VEL EAAQL P RS RAD VD F AWNGNYA I S SGMKLTEAL FQE P S FAYVNW S 
AVKTADKDS QWL KDVT E AYNS D AFKA Y AH KR F EG YKS PAAWNEGAAK* 

>ZV11_4 (SEQ ID NO: 904) 

MKTFFKTLSAAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I VFGTTVGD FGDMVKE 
QI QVEL E KKG YTVKL VE FTD YVR PNLALAEGE LD I NVFQH KP YLDD F KKEHNLD I TE VFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV12_4ASS (SEQ ID NO: 905) 

MKT F FKTLS AAALAL I LAACGGQKDRA P AAS AS AAS ENGAAKKE I L FGTT VGDLGDMVKE 
QI QAE L E KKG YTVKL VE FTD YVR PNLALAEGELD I NV FQH KP YLDDF KKE HNLD I TE VFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADI AENLKN I KIVELEAAQLPRSRADVDFAWNGNYA I S SGMKLTEALFQE PS FAYVNWS 
A VKT AD KDS QWL KDVT E A YNS DA FKA Y AHKR F EG YKS P AAWN EGAAK * 

>ZV13_4 (SEQ ID NO: 906) 

MKTF FKTLS AAALAL I LAACGGQKDS APAASAS AAADNGAAKKE I VFGTTVGDFGDMVKE 
QI QPELEKKGYTVKLVE FTDYVR PNLALAEGELD I NVFQHKPYLDDFKKEHNLDITE VFQ 
VPT A P LGL Y PG KL KS L E E VKDGS T VS A PND P S N F ARVL VMLD E LGW I KLKDG I N PLTAS K 
AD I AENL KN I K I VE L EAAQL PRS RADVD FA WNGN YA I S S GMKLTE AL FQ E P S FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV15_4 (SEQ ID NO: 907) 

MKT F FKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAE KKE I VFGTTVGDFGDMVKE 
H I Q P E L E KKG YTVKL VE FTDYVR PNLALAEG E LD I NVFQH K P YLDD F KKEHNLD I T E VFQ 
VPTAPLGLYPGKLKS LE EVKDGS TVS APNDPSNFARVLVMLDELGWI KLKDG INPLTASK 
AD I AENLKNI KIVELEAAQL PRS RADVD F AWNGNYA I S SGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAGNEGAAK* 

>ZV16_4 (SEQ ID NO: 908) 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEI VFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGW I KLKDG INPLTASK 
AD IAENLKNIKI VEL EAAQL PRS RAD VDF AWNGNYA I S SGMKLTEAL FQE P S FAYVNW S 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV17_4 (SEQ ID NO: 909) 

MKTF FKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAEKKE I VFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDI NVFQHKPYLDDFKKEHNLDITE VFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGIN PLTAS K 
AD I AENLKN I KI VELEAAQL PRS RAD VDF AWNGNYA I S SGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV18_4 (SEQ ID NO: 910) 

MKT F F KTL S AAALAL I LAACGGQ KD S A P AAS AS AAADNGAE KKE I VFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVE FTDYVR PNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPT A PLGL Y PGKL KS L E EVKDGS TVS A PND P SN F ARVL VMLD E LGW I KLKDG I N PLTAS K 
AD I AENL KN I K I VE L EAAQL PRS RADVD FA WNGN YA I S SGM KLTE AL FQE P S FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 
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>ZV19_4 (SEQ ID NO: 911) 

MKTFFKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I VFGTTVGDFGDMVKE 
Q I Q AELE KKG YT VEL VE FTD YVR PNLALAEG E LD I NV FQH K P YLDD F KKE HNLD I T E VFQ 

V PT A P LGL Y PG KL KS LE E VKDGS T VS A PND P SN FAR VL VMLDELGW I KLKDG I N PLT AS K 
5 ADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVNWS 

AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK * 

>ZV20_4 (SEQ ID NO: 912) 

MKT F F KTL S AAALAL I LAACGGQ KDS A PAAS AS AAADNGAAKKE I V FGTTVGD FGDMVKE 
QIQAELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
1 0 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
AD I AENLKNI KI VELEAAQLPRSRADVDFA VWGNYAI S SGMKLTEAL FQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV21_4 (SEQ ID NO: 913) 

MKTFFKTLS AAALAL I LAACGGQKDS APAAS ASAAADNGAAKKE I VFGTTVGDFGDMVKE 
1 5 Q I QAE L E KKG YT VKL VE FTD YVR PNL ALAEGE LD I NV FQH K P YLDD F KKEHNLD I TE VFQ 

V PT APLGL Y PG KL KS L E E VKDGS TVS A PND P S N FARVL VMLD ELGW I KLKDG I N P LT AS K 
ADI AENLKNI KI VELEAAQLPRSRADVD FA VWGNYAI SSGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV22_4 (SEQ ID NO: 914) 
20 MKTFFKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I V FGTTVGD FGDLVKE 
QIQPELEKKGYTVELVEFTDYVRPNLALGEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPT A P LGL Y PG KL KS LE E VKDGS TVS A PND P S N F AR VL VMLDELGW I KL KDG I N PLT AS K 
ADI AENLKN I KI VELEAAQLPRS RADVD FA VWGNYAI S SGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV24_4ASS (SEQ ID NO: 915) 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVELVEFTDDVRPNLALGEGELDIIVFQHKPYLDDFKKEQNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV25_4 (SEQ ID NO: 916) 

MKTFFKTLS AAALAL I LAACGGQKDSA P AAS AS AAADNGAE KKE I VFGTTVGDFGDMVKE 
Q I QP E L E KKG YT VKL VE FTD YVR PNLAL AE GE LD I NVFQH K P YLDD FKKE HNLD I T E V FQ 
VPTAPLGLY PGKLKSLEE VKDGS TVS APNDPSNFARALVMLDELGWI KLKDGINPLTASK 
3 5 ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRF EG YKS PAAWNEGAAK* 

>ZV2 6_4 (SEQ ID NO: 917) 

MKTFFKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAEKKE I VFGTTVGDFGDMVKE 
■ HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
40 VPTAPLGLY PGKLKSLEEVKDGSTVSAPNDPSNFARVL VMLDELGW I KLKDGINPLTASK 
ADI AENLKNI KIVELEAAQLPRSRADVDFAWNGNYAI SSGMKLTEALFQE PS FAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRF EG YKS PAAWNEGAAK* 

>ZV27_4 (SEQ ID NO: 918) ^ 
MKTF FKT L S AAALAL I LAACGGQKD SAP AAS AS AAADNGAAKKE I V FGTTVGD FGDMVKE 
45 QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVL VMLDELGW I KLKDGINPLTASK 
AD I AENLKN I K I VE L E AAQL P RS RADVDF AWNGNY A I S SGM KLTEAL FQE P S F A YVNW S 
AVKTADKDS QWLKDVTEAYNSDAFKAYAHKRFEG YKS PAAWNEGAAK * 

>ZV28_4 (SEQ ID NO: 919) 
50 ' MKT F F KTL S AAALAL I LAACGGQKDS A PAAS AS AAADNGAE KKE I VFGTTVGD FGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWI KLKDGINPLTASK 
AD IAENLKNIKIVEL EAAQL PR S RADVD F AWNGNY A I S S GMKLT E AL FQE P S FAYVNWS 
AVKTADKDS QWLKDVTEAYNSDAFKAYAHKRFEG YKS PAAWNEGAAK* 

55 >ZV29_4 (SEQ ID NO: 920) 

MKTFFKTLS AAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE I VFGTTVGDFGDMVKE 
QIQVELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
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VPT A P LGL Y PG KL KS LE E VKDGST VS A PND P SN F AR VL VMLD ELG W I KL KDG I N PLT AS K 
AD I AENL KN I K I VEL E AAQL P R S RAD VD F AWNGNYAI S SGMKLTE AL FQE P S F A YVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV32_4 (SEQ ID NO: 921) 
5 MKT F F KTL S AAALAL I LAACGGQKD S A PAAS AAAPS ADNGAAKKE I VFGTT VGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
KAD I AENLKNI KI VELEAAQL PRS RADVDF AWNGNYAI S SGMKLTEAL FQE PS FAYVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

10 >ZV33_4 (SEQ ID NO: 922) 

MKT F F KTL S AAALAL I LAACGGQKDS A PAAS AAAP S ADNGAAKKE I VFGTT VGDFGDMVK 

EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 

QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 

KAD I AENLKNI KI VELEAAQL PRS RADVDF AWNGNYAI S SGMKLTEAL FQE PS FAYVNW 
1 5 SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>ZV96_4 (SEQ ID NO: 923) 

M KT F F KTL S AAALAL I LAACGGQKDS A PAAS AS AAADNGAE KKE I VFGTT VGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
V PT A P LGL Y PGKL KS LE E VKDGS T VS A PND P SNF AR VL VMLDE LGW I KL KDG I N PLTAS K 
20 AD I AENLKNI KIVELEAAQLPRS RAD VDFAWNGNYAISSGMKLTEALFQE PS FAYVNW S 
AVKTAD KD S QWLKDVTE AYN S D AF KAYAHKRF EG YKS PAAWNEGAAK * 

Figure 8 shows the results of aligning the sequences of each of these strains. Dark shading 
indicates regions of homology, and gray shading indicates the conservation of amino acids with 
25 similar characteristics. As is readily discernible, there is significant conservation among the 
various strains of ORF 4 (SEQ ID NO: 216), further confirming its utility as an antigen for both 
vaccines and diagnostics. 



It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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